Skip to content

Commit 3a172d3

Browse files
committed
finish first pass of pipelines
1 parent a40f9af commit 3a172d3

38 files changed

+1101
-1432
lines changed

docs/source/en/api/pipelines/latent_diffusion_uncond.mdx

Lines changed: 6 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -12,31 +12,18 @@ specific language governing permissions and limitations under the License.
1212

1313
# Unconditional Latent Diffusion
1414

15-
## Overview
15+
Unconditional Latent Diffusion was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://huggingface.co/papers/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer.
1616

17-
Unconditional Latent Diffusion was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer.
18-
19-
The abstract of the paper is the following:
17+
The abstract from the paper is:
2018

2119
*By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.*
2220

23-
The original codebase can be found [here](https://github.com/CompVis/latent-diffusion).
24-
25-
## Tips:
26-
27-
-
28-
-
29-
-
30-
31-
## Available Pipelines:
32-
33-
| Pipeline | Tasks | Colab
34-
|---|---|:---:|
35-
| [pipeline_latent_diffusion_uncond.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion_uncond/pipeline_latent_diffusion_uncond.py) | *Unconditional Image Generation* | - |
36-
37-
## Examples:
21+
The original codebase can be found at [CompVis/latent-diffusion](https://github.com/CompVis/latent-diffusion).
3822

3923
## LDMPipeline
4024
[[autodoc]] LDMPipeline
4125
- all
4226
- __call__
27+
28+
## ImagePipelineOutput
29+
[[autodoc]] pipelines.ImagePipelineOutput

docs/source/en/api/pipelines/model_editing.mdx

Lines changed: 7 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -10,52 +10,20 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1010
specific language governing permissions and limitations under the License.
1111
-->
1212

13-
# Editing Implicit Assumptions in Text-to-Image Diffusion Models
13+
# Text-to-Image Model Editing
1414

15-
## Overview
15+
[Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://huggingface.co/papers/2303.08084) is by Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov. This pipeline enables editing diffusion model weights, such that its assumptions of a given concept are changed. The resulting change is expected to take effect in all prompt generations related to the edited concept.
1616

17-
[Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://arxiv.org/abs/2303.08084) by Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov.
18-
19-
The abstract of the paper is the following:
17+
The abstract from the paper is:
2018

2119
*Text-to-image diffusion models often make implicit assumptions about the world when generating images. While some assumptions are useful (e.g., the sky is blue), they can also be outdated, incorrect, or reflective of social biases present in the training data. Thus, there is a need to control these assumptions without requiring explicit user input or costly re-training. In this work, we aim to edit a given implicit assumption in a pre-trained diffusion model. Our Text-to-Image Model Editing method, TIME for short, receives a pair of inputs: a "source" under-specified prompt for which the model makes an implicit assumption (e.g., "a pack of roses"), and a "destination" prompt that describes the same setting, but with a specified desired attribute (e.g., "a pack of blue roses"). TIME then updates the model's cross-attention layers, as these layers assign visual meaning to textual tokens. We edit the projection matrices in these layers such that the source prompt is projected close to the destination prompt. Our method is highly efficient, as it modifies a mere 2.2% of the model's parameters in under one second. To evaluate model editing approaches, we introduce TIMED (TIME Dataset), containing 147 source and destination prompt pairs from various domains. Our experiments (using Stable Diffusion) show that TIME is successful in model editing, generalizes well for related prompts unseen during editing, and imposes minimal effect on unrelated generations.*
2220

23-
Resources:
24-
25-
* [Project Page](https://time-diffusion.github.io/).
26-
* [Paper](https://arxiv.org/abs/2303.08084).
27-
* [Original Code](https://github.com/bahjat-kawar/time-diffusion).
28-
* [Demo](https://huggingface.co/spaces/bahjat-kawar/time-diffusion).
29-
30-
## Available Pipelines:
31-
32-
| Pipeline | Tasks | Demo
33-
|---|---|:---:|
34-
| [StableDiffusionModelEditingPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_model_editing.py) | *Text-to-Image Model Editing* | [🤗 Space](https://huggingface.co/spaces/bahjat-kawar/time-diffusion)) |
35-
36-
This pipeline enables editing the diffusion model weights, such that its assumptions on a given concept are changed. The resulting change is expected to take effect in all prompt generations pertaining to the edited concept.
37-
38-
## Usage example
39-
40-
```python
41-
import torch
42-
from diffusers import StableDiffusionModelEditingPipeline
43-
44-
model_ckpt = "CompVis/stable-diffusion-v1-4"
45-
pipe = StableDiffusionModelEditingPipeline.from_pretrained(model_ckpt)
46-
47-
pipe = pipe.to("cuda")
48-
49-
source_prompt = "A pack of roses"
50-
destination_prompt = "A pack of blue roses"
51-
pipe.edit_model(source_prompt, destination_prompt)
52-
53-
prompt = "A field of roses"
54-
image = pipe(prompt).images[0]
55-
image.save("field_of_roses.png")
56-
```
21+
You can find additional information about model editing on the [project page](https://time-diffusion.github.io/), [paper](https://arxiv.org/abs/2303.08084), [original codebase](https://github.com/bahjat-kawar/time-diffusion), and try it out in a [demo](https://huggingface.co/spaces/bahjat-kawar/time-diffusion).
5722

5823
## StableDiffusionModelEditingPipeline
5924
[[autodoc]] StableDiffusionModelEditingPipeline
6025
- __call__
6126
- all
27+
28+
## StableDiffusionPipelineOutput
29+
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

docs/source/en/api/pipelines/self_attention_guidance.mdx

Lines changed: 7 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -10,56 +10,20 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1010
specific language governing permissions and limitations under the License.
1111
-->
1212

13-
# Self-Attention Guidance (SAG)
13+
# Self-Attention Guidance
1414

15-
## Overview
15+
[Improving Sample Quality of Diffusion Models Using Self-Attention Guidance](https://huggingface.co/papers/2210.00939) is by Susung Hong et al.
1616

17-
[Improving Sample Quality of Diffusion Models Using Self-Attention Guidance](https://arxiv.org/abs/2210.00939) by Susung Hong et al.
18-
19-
The abstract of the paper is the following:
17+
The abstract from the paper is:
2018

2119
*Denoising diffusion models (DDMs) have attracted attention for their exceptional generation quality and diversity. This success is largely attributed to the use of class- or text-conditional diffusion guidance methods, such as classifier and classifier-free guidance. In this paper, we present a more comprehensive perspective that goes beyond the traditional guidance methods. From this generalized perspective, we introduce novel condition- and training-free strategies to enhance the quality of generated images. As a simple solution, blur guidance improves the suitability of intermediate samples for their fine-scale information and structures, enabling diffusion models to generate higher quality samples with a moderate guidance scale. Improving upon this, Self-Attention Guidance (SAG) uses the intermediate self-attention maps of diffusion models to enhance their stability and efficacy. Specifically, SAG adversarially blurs only the regions that diffusion models attend to at each iteration and guides them accordingly. Our experimental results show that our SAG improves the performance of various diffusion models, including ADM, IDDPM, Stable Diffusion, and DiT. Moreover, combining SAG with conventional guidance methods leads to further improvement.*
2220

23-
Resources:
24-
25-
* [Project Page](https://ku-cvlab.github.io/Self-Attention-Guidance).
26-
* [Paper](https://arxiv.org/abs/2210.00939).
27-
* [Original Code](https://github.com/KU-CVLAB/Self-Attention-Guidance).
28-
* [Hugging Face Demo](https://huggingface.co/spaces/susunghong/Self-Attention-Guidance).
29-
* [Colab Demo](https://colab.research.google.com/github/SusungHong/Self-Attention-Guidance/blob/main/SAG_Stable.ipynb).
30-
31-
32-
## Available Pipelines:
33-
34-
| Pipeline | Tasks | Demo
35-
|---|---|:---:|
36-
| [StableDiffusionSAGPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py) | *Text-to-Image Generation* | [🤗 Space](https://huggingface.co/spaces/susunghong/Self-Attention-Guidance) |
37-
38-
## Usage example
39-
40-
```python
41-
import torch
42-
from diffusers import StableDiffusionSAGPipeline
43-
from accelerate.utils import set_seed
44-
45-
pipe = StableDiffusionSAGPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
46-
pipe = pipe.to("cuda")
47-
48-
seed = 8978
49-
prompt = "."
50-
guidance_scale = 7.5
51-
num_images_per_prompt = 1
52-
53-
sag_scale = 1.0
54-
55-
set_seed(seed)
56-
images = pipe(
57-
prompt, num_images_per_prompt=num_images_per_prompt, guidance_scale=guidance_scale, sag_scale=sag_scale
58-
).images
59-
images[0].save("example.png")
60-
```
21+
You can find additional information about Self-Attention Guidance on the [project page](https://ku-cvlab.github.io/Self-Attention-Guidance), [paper](https://arxiv.org/abs/2210.00939), [original codebase](https://github.com/KU-CVLAB/Self-Attention-Guidance), and try it out in a [demo](https://huggingface.co/spaces/susunghong/Self-Attention-Guidance) or [notebook](https://colab.research.google.com/github/SusungHong/Self-Attention-Guidance/blob/main/SAG_Stable.ipynb).
6122

6223
## StableDiffusionSAGPipeline
6324
[[autodoc]] StableDiffusionSAGPipeline
6425
- __call__
6526
- all
27+
28+
## StableDiffusionOutput
29+
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

docs/source/en/api/pipelines/semantic_stable_diffusion.mdx

Lines changed: 7 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -12,68 +12,18 @@ specific language governing permissions and limitations under the License.
1212

1313
# Semantic Guidance
1414

15-
Semantic Guidance for Diffusion Models was proposed in [SEGA: Instructing Diffusion using Semantic Dimensions](https://arxiv.org/abs/2301.12247) and provides strong semantic control over the image generation.
16-
Small changes to the text prompt usually result in entirely different output images. However, with SEGA a variety of changes to the image are enabled that can be controlled easily and intuitively, and stay true to the original image composition.
15+
Semantic Guidance for Diffusion Models was proposed in [SEGA: Instructing Diffusion using Semantic Dimensions](https://huggingface.co/papers/2301.12247) and provides strong semantic control over image generation.
16+
Small changes to the text prompt usually result in entirely different output images. However, with SEGA a variety of changes to the image are enabled that can be controlled easily and intuitively, while staying true to the original image composition.
1717

18-
The abstract of the paper is the following:
18+
The abstract from the paper is:
1919

2020
*Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user's intent is nearly impossible, yet small changes to the input prompt often result in very different images. This leaves the user with little semantic control. To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. This semantic guidance (SEGA) allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception. We demonstrate SEGA's effectiveness on a variety of tasks and provide evidence for its versatility and flexibility.*
2121

22-
23-
*Overview*:
24-
25-
| Pipeline | Tasks | Colab | Demo
26-
|---|---|:---:|:---:|
27-
| [pipeline_semantic_stable_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/semantic_stable_diffusion/pipeline_semantic_stable_diffusion.py) | *Text-to-Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/semantic-image-editing/blob/main/examples/SemanticGuidance.ipynb) | [Coming Soon](https://huggingface.co/AIML-TUDA)
28-
29-
## Tips
30-
31-
- The Semantic Guidance pipeline can be used with any [Stable Diffusion](./stable_diffusion/text2img) checkpoint.
32-
33-
### Run Semantic Guidance
34-
35-
The interface of [`SemanticStableDiffusionPipeline`] provides several additional parameters to influence the image generation.
36-
Exemplary usage may look like this:
37-
38-
```python
39-
import torch
40-
from diffusers import SemanticStableDiffusionPipeline
41-
42-
pipe = SemanticStableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
43-
pipe = pipe.to("cuda")
44-
45-
out = pipe(
46-
prompt="a photo of the face of a woman",
47-
num_images_per_prompt=1,
48-
guidance_scale=7,
49-
editing_prompt=[
50-
"smiling, smile", # Concepts to apply
51-
"glasses, wearing glasses",
52-
"curls, wavy hair, curly hair",
53-
"beard, full beard, mustache",
54-
],
55-
reverse_editing_direction=[False, False, False, False], # Direction of guidance i.e. increase all concepts
56-
edit_warmup_steps=[10, 10, 10, 10], # Warmup period for each concept
57-
edit_guidance_scale=[4, 5, 5, 5.4], # Guidance scale for each concept
58-
edit_threshold=[
59-
0.99,
60-
0.975,
61-
0.925,
62-
0.96,
63-
], # Threshold for each concept. Threshold equals the percentile of the latent space that will be discarded. I.e. threshold=0.99 uses 1% of the latent dimensions
64-
edit_momentum_scale=0.3, # Momentum scale that will be added to the latent guidance
65-
edit_mom_beta=0.6, # Momentum beta
66-
edit_weights=[1, 1, 1, 1, 1], # Weights of the individual concepts against each other
67-
)
68-
```
69-
70-
For more examples check the Colab notebook.
71-
72-
## StableDiffusionSafePipelineOutput
73-
[[autodoc]] pipelines.semantic_stable_diffusion.SemanticStableDiffusionPipelineOutput
74-
- all
75-
7622
## SemanticStableDiffusionPipeline
7723
[[autodoc]] SemanticStableDiffusionPipeline
7824
- all
7925
- __call__
26+
27+
## StableDiffusionSafePipelineOutput
28+
[[autodoc]] pipelines.semantic_stable_diffusion.SemanticStableDiffusionPipelineOutput
29+
- all

docs/source/en/api/pipelines/shap_e.mdx

Lines changed: 7 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -9,28 +9,13 @@ specific language governing permissions and limitations under the License.
99

1010
# Shap-E
1111

12-
## Overview
12+
The Shap-E model was proposed in [Shap-E: Generating Conditional 3D Implicit Functions](https://huggingface.co/papers/2305.02463) by Alex Nichol and Heewon Jun from [OpenAI](https://github.com/openai).
1313

14-
15-
The Shap-E model was proposed in [Shap-E: Generating Conditional 3D Implicit Functions](https://arxiv.org/abs/2305.02463) by Alex Nichol and Heewon Jun from [OpenAI](https://github.com/openai).
16-
17-
The abstract of the paper is the following:
14+
The abstract from the paper is:
1815

1916
*We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields. We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the parameters of an implicit function; second, we train a conditional diffusion model on outputs of the encoder. When trained on a large dataset of paired 3D and text data, our resulting models are capable of generating complex and diverse 3D assets in a matter of seconds. When compared to Point-E, an explicit generative model over point clouds, Shap-E converges faster and reaches comparable or better sample quality despite modeling a higher-dimensional, multi-representation output space.*
2017

21-
The original codebase can be found [here](https://github.com/openai/shap-e).
22-
23-
## Available Pipelines:
24-
25-
| Pipeline | Tasks |
26-
|---|---|
27-
| [pipeline_shap_e.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/shap_e/pipeline_shap_e.py) | *Text-to-Image Generation* |
28-
| [pipeline_shap_e_img2img.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/shap_e/pipeline_shap_e_img2img.py) | *Image-to-Image Generation* |
29-
30-
## Available checkpoints
31-
32-
* [`openai/shap-e`](https://huggingface.co/openai/shap-e)
33-
* [`openai/shap-e-img2img`](https://huggingface.co/openai/shap-e-img2img)
18+
The original codebase can be found at [openai/shap-e](https://github.com/openai/shap-e).
3419

3520
## Usage Examples
3621

@@ -136,4 +121,7 @@ gif_path = export_to_gif(images[0], "burger_3d.gif")
136121
## ShapEImg2ImgPipeline
137122
[[autodoc]] ShapEImg2ImgPipeline
138123
- all
139-
- __call__
124+
- __call__
125+
126+
## ShapEPipelineOutput
127+
[[autodoc]] pipelines.shap_e.pipeline_shap_e.ShapEPipelineOutput

0 commit comments

Comments
 (0)