Skip to content

Commit a69754b

Browse files
authored
[docs] Clean up pipeline apis (#3905)
* start with stable diffusion * fix * finish stable diffusion pipelines * fix path to pipeline output * fix flax paths * fix copies * add up to score sde ve * finish first pass of pipelines * fix copies * second review * align doc titles * more review fixes * final review
1 parent bcc570b commit a69754b

File tree

120 files changed

+3755
-4407
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

120 files changed

+3755
-4407
lines changed

docs/source/en/_toctree.yml

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@
181181
- local: api/pipelines/alt_diffusion
182182
title: AltDiffusion
183183
- local: api/pipelines/attend_and_excite
184-
title: Attend and Excite
184+
title: Attend-and-Excite
185185
- local: api/pipelines/audio_diffusion
186186
title: Audio Diffusion
187187
- local: api/pipelines/audioldm
@@ -211,7 +211,7 @@
211211
- local: api/pipelines/latent_diffusion
212212
title: Latent Diffusion
213213
- local: api/pipelines/panorama
214-
title: MultiDiffusion Panorama
214+
title: MultiDiffusion
215215
- local: api/pipelines/paint_by_example
216216
title: PaintByExample
217217
- local: api/pipelines/paradigms
@@ -236,25 +236,25 @@
236236
- local: api/pipelines/stable_diffusion/overview
237237
title: Overview
238238
- local: api/pipelines/stable_diffusion/text2img
239-
title: Text-to-Image
239+
title: Text-to-image
240240
- local: api/pipelines/stable_diffusion/img2img
241-
title: Image-to-Image
241+
title: Image-to-image
242242
- local: api/pipelines/stable_diffusion/inpaint
243-
title: Inpaint
243+
title: Inpainting
244244
- local: api/pipelines/stable_diffusion/depth2img
245-
title: Depth-to-Image
245+
title: Depth-to-image
246246
- local: api/pipelines/stable_diffusion/image_variation
247-
title: Image-Variation
247+
title: Image variation
248248
- local: api/pipelines/stable_diffusion/stable_diffusion_safe
249249
title: Safe Stable Diffusion
250250
- local: api/pipelines/stable_diffusion/stable_diffusion_2
251251
title: Stable Diffusion 2
252252
- local: api/pipelines/stable_diffusion/stable_diffusion_xl
253253
title: Stable Diffusion XL
254254
- local: api/pipelines/stable_diffusion/latent_upscale
255-
title: Stable-Diffusion-Latent-Upscaler
255+
title: Latent upscaler
256256
- local: api/pipelines/stable_diffusion/upscale
257-
title: Super-Resolution
257+
title: Super-resolution
258258
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
259259
title: LDM3D Text-to-(RGB, Depth)
260260
- local: api/pipelines/stable_diffusion/adapter
@@ -265,11 +265,11 @@
265265
- local: api/pipelines/stochastic_karras_ve
266266
title: Stochastic Karras VE
267267
- local: api/pipelines/model_editing
268-
title: Text-to-Image Model Editing
268+
title: Text-to-image model editing
269269
- local: api/pipelines/text_to_video
270-
title: Text-to-Video
270+
title: Text-to-video
271271
- local: api/pipelines/text_to_video_zero
272-
title: Text-to-Video Zero
272+
title: Text2Video-Zero
273273
- local: api/pipelines/unclip
274274
title: UnCLIP
275275
- local: api/pipelines/latent_diffusion_uncond

docs/source/en/api/pipelines/alt_diffusion.mdx

Lines changed: 14 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -12,72 +12,36 @@ specific language governing permissions and limitations under the License.
1212

1313
# AltDiffusion
1414

15-
AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu.
15+
AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://huggingface.co/papers/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu.
1616

17-
The abstract of the paper is the following:
17+
The abstract from the paper is:
1818

1919
*In this work, we present a conceptually simple and effective method to train a strong bilingual multimodal representation model. Starting from the pretrained multimodal representation model CLIP released by OpenAI, we switched its text encoder with a pretrained multilingual text encoder XLM-R, and aligned both languages and image representations by a two-stage training schema consisting of teacher learning and contrastive learning. We validate our method through evaluations of a wide range of tasks. We set new state-of-the-art performances on a bunch of tasks including ImageNet-CN, Flicker30k- CN, and COCO-CN. Further, we obtain very close performances with CLIP on almost all tasks, suggesting that one can simply alter the text encoder in CLIP for extended capabilities such as multilingual understanding.*
2020

21-
22-
*Overview*:
23-
24-
| Pipeline | Tasks | Colab | Demo
25-
|---|---|:---:|:---:|
26-
| [pipeline_alt_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion.py) | *Text-to-Image Generation* | - | -
27-
| [pipeline_alt_diffusion_img2img.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion_img2img.py) | *Image-to-Image Text-Guided Generation* | - |-
28-
2921
## Tips
3022

31-
- AltDiffusion is conceptually exactly the same as [Stable Diffusion](./stable_diffusion/overview).
32-
33-
- *Run AltDiffusion*
34-
35-
AltDiffusion can be tested very easily with the [`AltDiffusionPipeline`], [`AltDiffusionImg2ImgPipeline`] and the `"BAAI/AltDiffusion-m9"` checkpoint exactly in the same way it is shown in the [Conditional Image Generation Guide](../../using-diffusers/conditional_image_generation) and the [Image-to-Image Generation Guide](../../using-diffusers/img2img).
36-
37-
- *How to load and use different schedulers.*
38-
39-
The alt diffusion pipeline uses [`DDIMScheduler`] scheduler by default. But `diffusers` provides many other schedulers that can be used with the alt diffusion pipeline such as [`PNDMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`], [`EulerAncestralDiscreteScheduler`] etc.
40-
To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`] method or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the [`EulerDiscreteScheduler`], you can do the following:
41-
42-
```python
43-
>>> from diffusers import AltDiffusionPipeline, EulerDiscreteScheduler
44-
45-
>>> pipeline = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9")
46-
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
47-
48-
>>> # or
49-
>>> euler_scheduler = EulerDiscreteScheduler.from_pretrained("BAAI/AltDiffusion-m9", subfolder="scheduler")
50-
>>> pipeline = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9", scheduler=euler_scheduler)
51-
```
52-
53-
54-
- *How to convert all use cases with multiple or single pipeline*
55-
56-
If you want to use all possible use cases in a single `DiffusionPipeline` we recommend using the `components` functionality to instantiate all components in the most memory-efficient way:
23+
`AltDiffusion` is conceptually the same as [Stable Diffusion](./stable_diffusion/overview).
5724

58-
```python
59-
>>> from diffusers import (
60-
... AltDiffusionPipeline,
61-
... AltDiffusionImg2ImgPipeline,
62-
... )
25+
<Tip>
6326

64-
>>> text2img = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9")
65-
>>> img2img = AltDiffusionImg2ImgPipeline(**text2img.components)
27+
Make sure to check out the Schedulers [guide](/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
6628

67-
>>> # now you can use text2img(...) and img2img(...) just like the call methods of each respective pipeline
68-
```
69-
70-
## AltDiffusionPipelineOutput
71-
[[autodoc]] pipelines.alt_diffusion.AltDiffusionPipelineOutput
72-
- all
73-
- __call__
29+
</Tip>
7430

7531
## AltDiffusionPipeline
32+
7633
[[autodoc]] AltDiffusionPipeline
7734
- all
7835
- __call__
7936

8037
## AltDiffusionImg2ImgPipeline
38+
8139
[[autodoc]] AltDiffusionImg2ImgPipeline
8240
- all
8341
- __call__
42+
43+
## AltDiffusionPipelineOutput
44+
45+
[[autodoc]] pipelines.alt_diffusion.AltDiffusionPipelineOutput
46+
- all
47+
- __call__

docs/source/en/api/pipelines/attend_and_excite.mdx

Lines changed: 12 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -10,66 +10,28 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1010
specific language governing permissions and limitations under the License.
1111
-->
1212

13-
# Attend and Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
13+
# Attend-and-Excite
1414

15-
## Overview
15+
Attend-and-Excite for Stable Diffusion was proposed in [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://attendandexcite.github.io/Attend-and-Excite/) and provides textual attention control over image generation.
1616

17-
Attend and Excite for Stable Diffusion was proposed in [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://attendandexcite.github.io/Attend-and-Excite/) and provides textual attention control over the image generation.
18-
19-
The abstract of the paper is the following:
17+
The abstract from the paper is:
2018

2119
*Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user's intent is nearly impossible, yet small changes to the input prompt often result in very different images. This leaves the user with little semantic control. To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. This semantic guidance (SEGA) allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception. We demonstrate SEGA's effectiveness on a variety of tasks and provide evidence for its versatility and flexibility.*
2220

23-
Resources
24-
25-
* [Project Page](https://attendandexcite.github.io/Attend-and-Excite/)
26-
* [Paper](https://arxiv.org/abs/2301.13826)
27-
* [Original Code](https://github.com/AttendAndExcite/Attend-and-Excite)
28-
* [Demo](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite)
29-
30-
31-
## Available Pipelines:
32-
33-
| Pipeline | Tasks | Colab | Demo
34-
|---|---|:---:|:---:|
35-
| [pipeline_semantic_stable_diffusion_attend_and_excite.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_semantic_stable_diffusion_attend_and_excite) | *Text-to-Image Generation* | - | https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite
36-
37-
38-
### Usage example
39-
21+
You can find additional information about Attend-and-Excite on the [project page](https://attendandexcite.github.io/Attend-and-Excite/), the [original codebase](https://github.com/AttendAndExcite/Attend-and-Excite), or try it out in a [demo](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite).
4022

41-
```python
42-
import torch
43-
from diffusers import StableDiffusionAttendAndExcitePipeline
23+
<Tip>
4424

45-
model_id = "CompVis/stable-diffusion-v1-4"
46-
pipe = StableDiffusionAttendAndExcitePipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
47-
pipe = pipe.to("cuda")
48-
49-
prompt = "a cat and a frog"
50-
51-
# use get_indices function to find out indices of the tokens you want to alter
52-
pipe.get_indices(prompt)
53-
54-
token_indices = [2, 5]
55-
seed = 6141
56-
generator = torch.Generator("cuda").manual_seed(seed)
57-
58-
images = pipe(
59-
prompt=prompt,
60-
token_indices=token_indices,
61-
guidance_scale=7.5,
62-
generator=generator,
63-
num_inference_steps=50,
64-
max_iter_to_alter=25,
65-
).images
66-
67-
image = images[0]
68-
image.save(f"../images/{prompt}_{seed}.png")
69-
```
25+
Make sure to check out the Schedulers [guide](/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
7026

27+
</Tip>
7128

7229
## StableDiffusionAttendAndExcitePipeline
30+
7331
[[autodoc]] StableDiffusionAttendAndExcitePipeline
7432
- all
7533
- __call__
34+
35+
## StableDiffusionPipelineOutput
36+
37+
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

docs/source/en/api/pipelines/audio_diffusion.mdx

Lines changed: 11 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -12,87 +12,26 @@ specific language governing permissions and limitations under the License.
1212

1313
# Audio Diffusion
1414

15-
## Overview
15+
[Audio Diffusion](https://github.com/teticio/audio-diffusion) is by Robert Dargavel Smith, and it leverages the recent advances in image generation from diffusion models by converting audio samples to and from Mel spectrogram images.
1616

17-
[Audio Diffusion](https://github.com/teticio/audio-diffusion) by Robert Dargavel Smith.
17+
The original codebase, training scripts and example notebooks can be found at [teticio/audio-diffusion](https://github.com/teticio/audio-diffusion).
1818

19-
Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to
20-
and from mel spectrogram images.
19+
<Tip>
2120

22-
The original codebase of this implementation can be found [here](https://github.com/teticio/audio-diffusion), including
23-
training scripts and example notebooks.
21+
Make sure to check out the Schedulers [guide](/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
2422

25-
## Available Pipelines:
26-
27-
| Pipeline | Tasks | Colab
28-
|---|---|:---:|
29-
| [pipeline_audio_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py) | *Unconditional Audio Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) |
30-
31-
32-
## Examples:
33-
34-
### Audio Diffusion
35-
36-
```python
37-
import torch
38-
from IPython.display import Audio
39-
from diffusers import DiffusionPipeline
40-
41-
device = "cuda" if torch.cuda.is_available() else "cpu"
42-
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-256").to(device)
43-
44-
output = pipe()
45-
display(output.images[0])
46-
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
47-
```
48-
49-
### Latent Audio Diffusion
50-
51-
```python
52-
import torch
53-
from IPython.display import Audio
54-
from diffusers import DiffusionPipeline
55-
56-
device = "cuda" if torch.cuda.is_available() else "cpu"
57-
pipe = DiffusionPipeline.from_pretrained("teticio/latent-audio-diffusion-256").to(device)
58-
59-
output = pipe()
60-
display(output.images[0])
61-
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
62-
```
63-
64-
### Audio Diffusion with DDIM (faster)
65-
66-
```python
67-
import torch
68-
from IPython.display import Audio
69-
from diffusers import DiffusionPipeline
70-
71-
device = "cuda" if torch.cuda.is_available() else "cpu"
72-
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to(device)
73-
74-
output = pipe()
75-
display(output.images[0])
76-
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
77-
```
78-
79-
### Variations, in-painting, out-painting etc.
80-
81-
```python
82-
output = pipe(
83-
raw_audio=output.audios[0, 0],
84-
start_step=int(pipe.get_default_steps() / 2),
85-
mask_start_secs=1,
86-
mask_end_secs=1,
87-
)
88-
display(output.images[0])
89-
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
90-
```
23+
</Tip>
9124

9225
## AudioDiffusionPipeline
9326
[[autodoc]] AudioDiffusionPipeline
9427
- all
9528
- __call__
9629

30+
## AudioPipelineOutput
31+
[[autodoc]] pipelines.AudioPipelineOutput
32+
33+
## ImagePipelineOutput
34+
[[autodoc]] pipelines.ImagePipelineOutput
35+
9736
## Mel
9837
[[autodoc]] Mel

0 commit comments

Comments
 (0)