Skip to content

Commit c8ab1f1

Browse files
stevhliuorpatashnik
authored andcommitted
[docs] Clean up pipeline apis (huggingface#3905)
* start with stable diffusion * fix * finish stable diffusion pipelines * fix path to pipeline output * fix flax paths * fix copies * add up to score sde ve * finish first pass of pipelines * fix copies * second review * align doc titles * more review fixes * final review
1 parent 71abe98 commit c8ab1f1

File tree

119 files changed

+3741
-4310
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

119 files changed

+3741
-4310
lines changed

docs/source/en/_toctree.yml

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@
181181
- local: api/pipelines/alt_diffusion
182182
title: AltDiffusion
183183
- local: api/pipelines/attend_and_excite
184-
title: Attend and Excite
184+
title: Attend-and-Excite
185185
- local: api/pipelines/audio_diffusion
186186
title: Audio Diffusion
187187
- local: api/pipelines/audioldm
@@ -211,7 +211,7 @@
211211
- local: api/pipelines/latent_diffusion
212212
title: Latent Diffusion
213213
- local: api/pipelines/panorama
214-
title: MultiDiffusion Panorama
214+
title: MultiDiffusion
215215
- local: api/pipelines/paint_by_example
216216
title: PaintByExample
217217
- local: api/pipelines/paradigms
@@ -238,25 +238,25 @@
238238
- local: api/pipelines/stable_diffusion/overview
239239
title: Overview
240240
- local: api/pipelines/stable_diffusion/text2img
241-
title: Text-to-Image
241+
title: Text-to-image
242242
- local: api/pipelines/stable_diffusion/img2img
243-
title: Image-to-Image
243+
title: Image-to-image
244244
- local: api/pipelines/stable_diffusion/inpaint
245-
title: Inpaint
245+
title: Inpainting
246246
- local: api/pipelines/stable_diffusion/depth2img
247-
title: Depth-to-Image
247+
title: Depth-to-image
248248
- local: api/pipelines/stable_diffusion/image_variation
249-
title: Image-Variation
249+
title: Image variation
250250
- local: api/pipelines/stable_diffusion/stable_diffusion_safe
251251
title: Safe Stable Diffusion
252252
- local: api/pipelines/stable_diffusion/stable_diffusion_2
253253
title: Stable Diffusion 2
254254
- local: api/pipelines/stable_diffusion/stable_diffusion_xl
255255
title: Stable Diffusion XL
256256
- local: api/pipelines/stable_diffusion/latent_upscale
257-
title: Stable-Diffusion-Latent-Upscaler
257+
title: Latent upscaler
258258
- local: api/pipelines/stable_diffusion/upscale
259-
title: Super-Resolution
259+
title: Super-resolution
260260
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
261261
title: LDM3D Text-to-(RGB, Depth)
262262
- local: api/pipelines/stable_diffusion/adapter
@@ -267,11 +267,11 @@
267267
- local: api/pipelines/stochastic_karras_ve
268268
title: Stochastic Karras VE
269269
- local: api/pipelines/model_editing
270-
title: Text-to-Image Model Editing
270+
title: Text-to-image model editing
271271
- local: api/pipelines/text_to_video
272-
title: Text-to-Video
272+
title: Text-to-video
273273
- local: api/pipelines/text_to_video_zero
274-
title: Text-to-Video Zero
274+
title: Text2Video-Zero
275275
- local: api/pipelines/unclip
276276
title: UnCLIP
277277
- local: api/pipelines/latent_diffusion_uncond

docs/source/en/api/pipelines/alt_diffusion.mdx

Lines changed: 14 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -12,72 +12,36 @@ specific language governing permissions and limitations under the License.
1212

1313
# AltDiffusion
1414

15-
AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu.
15+
AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://huggingface.co/papers/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu.
1616

17-
The abstract of the paper is the following:
17+
The abstract from the paper is:
1818

1919
*In this work, we present a conceptually simple and effective method to train a strong bilingual multimodal representation model. Starting from the pretrained multimodal representation model CLIP released by OpenAI, we switched its text encoder with a pretrained multilingual text encoder XLM-R, and aligned both languages and image representations by a two-stage training schema consisting of teacher learning and contrastive learning. We validate our method through evaluations of a wide range of tasks. We set new state-of-the-art performances on a bunch of tasks including ImageNet-CN, Flicker30k- CN, and COCO-CN. Further, we obtain very close performances with CLIP on almost all tasks, suggesting that one can simply alter the text encoder in CLIP for extended capabilities such as multilingual understanding.*
2020

21-
22-
*Overview*:
23-
24-
| Pipeline | Tasks | Colab | Demo
25-
|---|---|:---:|:---:|
26-
| [pipeline_alt_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion.py) | *Text-to-Image Generation* | - | -
27-
| [pipeline_alt_diffusion_img2img.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion_img2img.py) | *Image-to-Image Text-Guided Generation* | - |-
28-
2921
## Tips
3022

31-
- AltDiffusion is conceptually exactly the same as [Stable Diffusion](./stable_diffusion/overview).
32-
33-
- *Run AltDiffusion*
34-
35-
AltDiffusion can be tested very easily with the [`AltDiffusionPipeline`], [`AltDiffusionImg2ImgPipeline`] and the `"BAAI/AltDiffusion-m9"` checkpoint exactly in the same way it is shown in the [Conditional Image Generation Guide](../../using-diffusers/conditional_image_generation) and the [Image-to-Image Generation Guide](../../using-diffusers/img2img).
36-
37-
- *How to load and use different schedulers.*
38-
39-
The alt diffusion pipeline uses [`DDIMScheduler`] scheduler by default. But `diffusers` provides many other schedulers that can be used with the alt diffusion pipeline such as [`PNDMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`], [`EulerAncestralDiscreteScheduler`] etc.
40-
To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`] method or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the [`EulerDiscreteScheduler`], you can do the following:
41-
42-
```python
43-
>>> from diffusers import AltDiffusionPipeline, EulerDiscreteScheduler
44-
45-
>>> pipeline = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9")
46-
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
47-
48-
>>> # or
49-
>>> euler_scheduler = EulerDiscreteScheduler.from_pretrained("BAAI/AltDiffusion-m9", subfolder="scheduler")
50-
>>> pipeline = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9", scheduler=euler_scheduler)
51-
```
52-
53-
54-
- *How to convert all use cases with multiple or single pipeline*
55-
56-
If you want to use all possible use cases in a single `DiffusionPipeline` we recommend using the `components` functionality to instantiate all components in the most memory-efficient way:
23+
`AltDiffusion` is conceptually the same as [Stable Diffusion](./stable_diffusion/overview).
5724

58-
```python
59-
>>> from diffusers import (
60-
... AltDiffusionPipeline,
61-
... AltDiffusionImg2ImgPipeline,
62-
... )
25+
<Tip>
6326

64-
>>> text2img = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9")
65-
>>> img2img = AltDiffusionImg2ImgPipeline(**text2img.components)
27+
Make sure to check out the Schedulers [guide](/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
6628

67-
>>> # now you can use text2img(...) and img2img(...) just like the call methods of each respective pipeline
68-
```
69-
70-
## AltDiffusionPipelineOutput
71-
[[autodoc]] pipelines.alt_diffusion.AltDiffusionPipelineOutput
72-
- all
73-
- __call__
29+
</Tip>
7430

7531
## AltDiffusionPipeline
32+
7633
[[autodoc]] AltDiffusionPipeline
7734
- all
7835
- __call__
7936

8037
## AltDiffusionImg2ImgPipeline
38+
8139
[[autodoc]] AltDiffusionImg2ImgPipeline
8240
- all
8341
- __call__
42+
43+
## AltDiffusionPipelineOutput
44+
45+
[[autodoc]] pipelines.alt_diffusion.AltDiffusionPipelineOutput
46+
- all
47+
- __call__

docs/source/en/api/pipelines/attend_and_excite.mdx

Lines changed: 12 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -10,66 +10,28 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1010
specific language governing permissions and limitations under the License.
1111
-->
1212

13-
# Attend and Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
13+
# Attend-and-Excite
1414

15-
## Overview
15+
Attend-and-Excite for Stable Diffusion was proposed in [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://attendandexcite.github.io/Attend-and-Excite/) and provides textual attention control over image generation.
1616

17-
Attend and Excite for Stable Diffusion was proposed in [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://attendandexcite.github.io/Attend-and-Excite/) and provides textual attention control over the image generation.
18-
19-
The abstract of the paper is the following:
17+
The abstract from the paper is:
2018

2119
*Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user's intent is nearly impossible, yet small changes to the input prompt often result in very different images. This leaves the user with little semantic control. To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. This semantic guidance (SEGA) allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception. We demonstrate SEGA's effectiveness on a variety of tasks and provide evidence for its versatility and flexibility.*
2220

23-
Resources
24-
25-
* [Project Page](https://attendandexcite.github.io/Attend-and-Excite/)
26-
* [Paper](https://arxiv.org/abs/2301.13826)
27-
* [Original Code](https://github.com/AttendAndExcite/Attend-and-Excite)
28-
* [Demo](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite)
29-
30-
31-
## Available Pipelines:
32-
33-
| Pipeline | Tasks | Colab | Demo
34-
|---|---|:---:|:---:|
35-
| [pipeline_semantic_stable_diffusion_attend_and_excite.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_semantic_stable_diffusion_attend_and_excite) | *Text-to-Image Generation* | - | https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite
36-
37-
38-
### Usage example
39-
21+
You can find additional information about Attend-and-Excite on the [project page](https://attendandexcite.github.io/Attend-and-Excite/), the [original codebase](https://github.com/AttendAndExcite/Attend-and-Excite), or try it out in a [demo](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite).
4022

41-
```python
42-
import torch
43-
from diffusers import StableDiffusionAttendAndExcitePipeline
23+
<Tip>
4424

45-
model_id = "CompVis/stable-diffusion-v1-4"
46-
pipe = StableDiffusionAttendAndExcitePipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
47-
pipe = pipe.to("cuda")
48-
49-
prompt = "a cat and a frog"
50-
51-
# use get_indices function to find out indices of the tokens you want to alter
52-
pipe.get_indices(prompt)
53-
54-
token_indices = [2, 5]
55-
seed = 6141
56-
generator = torch.Generator("cuda").manual_seed(seed)
57-
58-
images = pipe(
59-
prompt=prompt,
60-
token_indices=token_indices,
61-
guidance_scale=7.5,
62-
generator=generator,
63-
num_inference_steps=50,
64-
max_iter_to_alter=25,
65-
).images
66-
67-
image = images[0]
68-
image.save(f"../images/{prompt}_{seed}.png")
69-
```
25+
Make sure to check out the Schedulers [guide](/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
7026

27+
</Tip>
7128

7229
## StableDiffusionAttendAndExcitePipeline
30+
7331
[[autodoc]] StableDiffusionAttendAndExcitePipeline
7432
- all
7533
- __call__
34+
35+
## StableDiffusionPipelineOutput
36+
37+
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

docs/source/en/api/pipelines/audio_diffusion.mdx

Lines changed: 11 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -12,87 +12,26 @@ specific language governing permissions and limitations under the License.
1212

1313
# Audio Diffusion
1414

15-
## Overview
15+
[Audio Diffusion](https://github.com/teticio/audio-diffusion) is by Robert Dargavel Smith, and it leverages the recent advances in image generation from diffusion models by converting audio samples to and from Mel spectrogram images.
1616

17-
[Audio Diffusion](https://github.com/teticio/audio-diffusion) by Robert Dargavel Smith.
17+
The original codebase, training scripts and example notebooks can be found at [teticio/audio-diffusion](https://github.com/teticio/audio-diffusion).
1818

19-
Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to
20-
and from mel spectrogram images.
19+
<Tip>
2120

22-
The original codebase of this implementation can be found [here](https://github.com/teticio/audio-diffusion), including
23-
training scripts and example notebooks.
21+
Make sure to check out the Schedulers [guide](/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
2422

25-
## Available Pipelines:
26-
27-
| Pipeline | Tasks | Colab
28-
|---|---|:---:|
29-
| [pipeline_audio_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py) | *Unconditional Audio Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) |
30-
31-
32-
## Examples:
33-
34-
### Audio Diffusion
35-
36-
```python
37-
import torch
38-
from IPython.display import Audio
39-
from diffusers import DiffusionPipeline
40-
41-
device = "cuda" if torch.cuda.is_available() else "cpu"
42-
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-256").to(device)
43-
44-
output = pipe()
45-
display(output.images[0])
46-
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
47-
```
48-
49-
### Latent Audio Diffusion
50-
51-
```python
52-
import torch
53-
from IPython.display import Audio
54-
from diffusers import DiffusionPipeline
55-
56-
device = "cuda" if torch.cuda.is_available() else "cpu"
57-
pipe = DiffusionPipeline.from_pretrained("teticio/latent-audio-diffusion-256").to(device)
58-
59-
output = pipe()
60-
display(output.images[0])
61-
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
62-
```
63-
64-
### Audio Diffusion with DDIM (faster)
65-
66-
```python
67-
import torch
68-
from IPython.display import Audio
69-
from diffusers import DiffusionPipeline
70-
71-
device = "cuda" if torch.cuda.is_available() else "cpu"
72-
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to(device)
73-
74-
output = pipe()
75-
display(output.images[0])
76-
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
77-
```
78-
79-
### Variations, in-painting, out-painting etc.
80-
81-
```python
82-
output = pipe(
83-
raw_audio=output.audios[0, 0],
84-
start_step=int(pipe.get_default_steps() / 2),
85-
mask_start_secs=1,
86-
mask_end_secs=1,
87-
)
88-
display(output.images[0])
89-
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
90-
```
23+
</Tip>
9124

9225
## AudioDiffusionPipeline
9326
[[autodoc]] AudioDiffusionPipeline
9427
- all
9528
- __call__
9629

30+
## AudioPipelineOutput
31+
[[autodoc]] pipelines.AudioPipelineOutput
32+
33+
## ImagePipelineOutput
34+
[[autodoc]] pipelines.ImagePipelineOutput
35+
9736
## Mel
9837
[[autodoc]] Mel

0 commit comments

Comments
 (0)