Skip to content

Commit a40f9af

Browse files
committed
add up to score sde ve
1 parent ad8709f commit a40f9af

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+1313
-1577
lines changed

docs/source/en/api/pipelines/alt_diffusion.mdx

Lines changed: 10 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -12,72 +12,26 @@ specific language governing permissions and limitations under the License.
1212

1313
# AltDiffusion
1414

15-
AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu.
15+
AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://huggingface.co/papers/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu.
1616

17-
The abstract of the paper is the following:
17+
The abstract from the paper is:
1818

1919
*In this work, we present a conceptually simple and effective method to train a strong bilingual multimodal representation model. Starting from the pretrained multimodal representation model CLIP released by OpenAI, we switched its text encoder with a pretrained multilingual text encoder XLM-R, and aligned both languages and image representations by a two-stage training schema consisting of teacher learning and contrastive learning. We validate our method through evaluations of a wide range of tasks. We set new state-of-the-art performances on a bunch of tasks including ImageNet-CN, Flicker30k- CN, and COCO-CN. Further, we obtain very close performances with CLIP on almost all tasks, suggesting that one can simply alter the text encoder in CLIP for extended capabilities such as multilingual understanding.*
2020

21-
22-
*Overview*:
23-
24-
| Pipeline | Tasks | Colab | Demo
25-
|---|---|:---:|:---:|
26-
| [pipeline_alt_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion.py) | *Text-to-Image Generation* | - | -
27-
| [pipeline_alt_diffusion_img2img.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion_img2img.py) | *Image-to-Image Text-Guided Generation* | - |-
28-
29-
## Tips
30-
31-
- AltDiffusion is conceptually exactly the same as [Stable Diffusion](./stable_diffusion/overview).
32-
33-
- *Run AltDiffusion*
34-
35-
AltDiffusion can be tested very easily with the [`AltDiffusionPipeline`], [`AltDiffusionImg2ImgPipeline`] and the `"BAAI/AltDiffusion-m9"` checkpoint exactly in the same way it is shown in the [Conditional Image Generation Guide](../../using-diffusers/conditional_image_generation) and the [Image-to-Image Generation Guide](../../using-diffusers/img2img).
36-
37-
- *How to load and use different schedulers.*
38-
39-
The alt diffusion pipeline uses [`DDIMScheduler`] scheduler by default. But `diffusers` provides many other schedulers that can be used with the alt diffusion pipeline such as [`PNDMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`], [`EulerAncestralDiscreteScheduler`] etc.
40-
To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`] method or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the [`EulerDiscreteScheduler`], you can do the following:
41-
42-
```python
43-
>>> from diffusers import AltDiffusionPipeline, EulerDiscreteScheduler
44-
45-
>>> pipeline = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9")
46-
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
47-
48-
>>> # or
49-
>>> euler_scheduler = EulerDiscreteScheduler.from_pretrained("BAAI/AltDiffusion-m9", subfolder="scheduler")
50-
>>> pipeline = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9", scheduler=euler_scheduler)
51-
```
52-
53-
54-
- *How to convert all use cases with multiple or single pipeline*
55-
56-
If you want to use all possible use cases in a single `DiffusionPipeline` we recommend using the `components` functionality to instantiate all components in the most memory-efficient way:
57-
58-
```python
59-
>>> from diffusers import (
60-
... AltDiffusionPipeline,
61-
... AltDiffusionImg2ImgPipeline,
62-
... )
63-
64-
>>> text2img = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9")
65-
>>> img2img = AltDiffusionImg2ImgPipeline(**text2img.components)
66-
67-
>>> # now you can use text2img(...) and img2img(...) just like the call methods of each respective pipeline
68-
```
69-
70-
## AltDiffusionPipelineOutput
71-
[[autodoc]] pipelines.alt_diffusion.AltDiffusionPipelineOutput
72-
- all
73-
- __call__
74-
7521
## AltDiffusionPipeline
22+
7623
[[autodoc]] AltDiffusionPipeline
7724
- all
7825
- __call__
7926

8027
## AltDiffusionImg2ImgPipeline
28+
8129
[[autodoc]] AltDiffusionImg2ImgPipeline
8230
- all
8331
- __call__
32+
33+
## AltDiffusionPipelineOutput
34+
35+
[[autodoc]] pipelines.alt_diffusion.AltDiffusionPipelineOutput
36+
- all
37+
- __call__

docs/source/en/api/pipelines/attend_and_excite.mdx

Lines changed: 9 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -10,66 +10,22 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1010
specific language governing permissions and limitations under the License.
1111
-->
1212

13-
# Attend and Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
13+
# Attend and Excite
1414

15-
## Overview
15+
Attend and Excite for Stable Diffusion was proposed in [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://attendandexcite.github.io/Attend-and-Excite/) and provides textual attention control over image generation.
1616

17-
Attend and Excite for Stable Diffusion was proposed in [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://attendandexcite.github.io/Attend-and-Excite/) and provides textual attention control over the image generation.
18-
19-
The abstract of the paper is the following:
17+
The abstract from the paper is:
2018

2119
*Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user's intent is nearly impossible, yet small changes to the input prompt often result in very different images. This leaves the user with little semantic control. To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. This semantic guidance (SEGA) allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception. We demonstrate SEGA's effectiveness on a variety of tasks and provide evidence for its versatility and flexibility.*
2220

23-
Resources
24-
25-
* [Project Page](https://attendandexcite.github.io/Attend-and-Excite/)
26-
* [Paper](https://arxiv.org/abs/2301.13826)
27-
* [Original Code](https://github.com/AttendAndExcite/Attend-and-Excite)
28-
* [Demo](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite)
29-
30-
31-
## Available Pipelines:
32-
33-
| Pipeline | Tasks | Colab | Demo
34-
|---|---|:---:|:---:|
35-
| [pipeline_semantic_stable_diffusion_attend_and_excite.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_semantic_stable_diffusion_attend_and_excite) | *Text-to-Image Generation* | - | https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite
36-
37-
38-
### Usage example
39-
40-
41-
```python
42-
import torch
43-
from diffusers import StableDiffusionAttendAndExcitePipeline
44-
45-
model_id = "CompVis/stable-diffusion-v1-4"
46-
pipe = StableDiffusionAttendAndExcitePipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
47-
pipe = pipe.to("cuda")
48-
49-
prompt = "a cat and a frog"
50-
51-
# use get_indices function to find out indices of the tokens you want to alter
52-
pipe.get_indices(prompt)
53-
54-
token_indices = [2, 5]
55-
seed = 6141
56-
generator = torch.Generator("cuda").manual_seed(seed)
57-
58-
images = pipe(
59-
prompt=prompt,
60-
token_indices=token_indices,
61-
guidance_scale=7.5,
62-
generator=generator,
63-
num_inference_steps=50,
64-
max_iter_to_alter=25,
65-
).images
66-
67-
image = images[0]
68-
image.save(f"../images/{prompt}_{seed}.png")
69-
```
70-
21+
You can find additional information about Attend and Excite on the [project page](https://attendandexcite.github.io/Attend-and-Excite/), [paper](https://arxiv.org/abs/2301.13826), the [original codebase](https://github.com/AttendAndExcite/Attend-and-Excite), or try it out in a [demo](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite).
7122

7223
## StableDiffusionAttendAndExcitePipeline
24+
7325
[[autodoc]] StableDiffusionAttendAndExcitePipeline
7426
- all
7527
- __call__
28+
29+
## StableDiffusionPipelineOutput
30+
31+
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

docs/source/en/api/pipelines/audio_diffusion.mdx

Lines changed: 8 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -12,87 +12,20 @@ specific language governing permissions and limitations under the License.
1212

1313
# Audio Diffusion
1414

15-
## Overview
15+
[Audio Diffusion](https://github.com/teticio/audio-diffusion) is by Robert Dargavel Smith, and it leverages the recent advances in image generation from diffusion models by converting audio samples to and from Mel spectrogram images.
1616

17-
[Audio Diffusion](https://github.com/teticio/audio-diffusion) by Robert Dargavel Smith.
18-
19-
Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to
20-
and from mel spectrogram images.
21-
22-
The original codebase of this implementation can be found [here](https://github.com/teticio/audio-diffusion), including
23-
training scripts and example notebooks.
24-
25-
## Available Pipelines:
26-
27-
| Pipeline | Tasks | Colab
28-
|---|---|:---:|
29-
| [pipeline_audio_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py) | *Unconditional Audio Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) |
30-
31-
32-
## Examples:
33-
34-
### Audio Diffusion
35-
36-
```python
37-
import torch
38-
from IPython.display import Audio
39-
from diffusers import DiffusionPipeline
40-
41-
device = "cuda" if torch.cuda.is_available() else "cpu"
42-
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-256").to(device)
43-
44-
output = pipe()
45-
display(output.images[0])
46-
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
47-
```
48-
49-
### Latent Audio Diffusion
50-
51-
```python
52-
import torch
53-
from IPython.display import Audio
54-
from diffusers import DiffusionPipeline
55-
56-
device = "cuda" if torch.cuda.is_available() else "cpu"
57-
pipe = DiffusionPipeline.from_pretrained("teticio/latent-audio-diffusion-256").to(device)
58-
59-
output = pipe()
60-
display(output.images[0])
61-
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
62-
```
63-
64-
### Audio Diffusion with DDIM (faster)
65-
66-
```python
67-
import torch
68-
from IPython.display import Audio
69-
from diffusers import DiffusionPipeline
70-
71-
device = "cuda" if torch.cuda.is_available() else "cpu"
72-
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to(device)
73-
74-
output = pipe()
75-
display(output.images[0])
76-
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
77-
```
78-
79-
### Variations, in-painting, out-painting etc.
80-
81-
```python
82-
output = pipe(
83-
raw_audio=output.audios[0, 0],
84-
start_step=int(pipe.get_default_steps() / 2),
85-
mask_start_secs=1,
86-
mask_end_secs=1,
87-
)
88-
display(output.images[0])
89-
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
90-
```
17+
The original codebase, training scripts and example notebooks can be found at [teticio/audio-diffusion](https://github.com/teticio/audio-diffusion).
9118

9219
## AudioDiffusionPipeline
9320
[[autodoc]] AudioDiffusionPipeline
9421
- all
9522
- __call__
9623

24+
## AudioPipelineOutput
25+
[[autodoc]] pipelines.AudioPipelineOutput
26+
27+
## ImagePipelineOutput
28+
[[autodoc]] pipelines.ImagePipelineOutput
29+
9730
## Mel
9831
[[autodoc]] Mel

docs/source/en/api/pipelines/audioldm.mdx

Lines changed: 12 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -12,73 +12,32 @@ specific language governing permissions and limitations under the License.
1212

1313
# AudioLDM
1414

15-
## Overview
16-
17-
AudioLDM was proposed in [AudioLDM: Text-to-Audio Generation with Latent Diffusion Models](https://arxiv.org/abs/2301.12503) by Haohe Liu et al.
15+
AudioLDM was proposed in [AudioLDM: Text-to-Audio Generation with Latent Diffusion Models](https://huggingface.co/papers/2301.12503) by Haohe Liu et al.
1816

1917
Inspired by [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview), AudioLDM
2018
is a text-to-audio _latent diffusion model (LDM)_ that learns continuous audio representations from [CLAP](https://huggingface.co/docs/transformers/main/model_doc/clap)
2119
latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional
2220
sound effects, human speech and music.
2321

24-
This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi). The original codebase can be found [here](https://github.com/haoheliu/AudioLDM).
25-
26-
## Text-to-Audio
27-
28-
The [`AudioLDMPipeline`] can be used to load pre-trained weights from [cvssp/audioldm-s-full-v2](https://huggingface.co/cvssp/audioldm-s-full-v2) and generate text-conditional audio outputs:
29-
30-
```python
31-
from diffusers import AudioLDMPipeline
32-
import torch
33-
import scipy
22+
The original codebase can be found at [haoheliu/AudioLDM](https://github.com/haoheliu/AudioLDM), and the pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi).
3423

35-
repo_id = "cvssp/audioldm-s-full-v2"
36-
pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
37-
pipe = pipe.to("cuda")
24+
## Tips
3825

39-
prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
40-
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]
26+
When constructing a prompt, keep in mind:
4127

42-
# save the audio sample as a .wav file
43-
scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)
44-
```
28+
* Descriptive prompt inputs work best; you can use adjectives to describe the sound (for example, "high quality" or "clear") and make the prompt context specific (for example, "water stream in a forest" instead of "stream").
29+
* It's best to use general terms like "cat" or "dog" instead of specific names or abstract objects the model may not be familiar with.
4530

46-
### Tips
31+
During inference:
4732

48-
Prompts:
49-
* Descriptive prompt inputs work best: you can use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g., "water stream in a forest" instead of "stream").
50-
* It's best to use general terms like 'cat' or 'dog' instead of specific names or abstract objects that the model may not be familiar with.
51-
52-
Inference:
53-
* The _quality_ of the predicted audio sample can be controlled by the `num_inference_steps` argument: higher steps give higher quality audio at the expense of slower inference.
33+
* The _quality_ of the predicted audio sample can be controlled by the `num_inference_steps` argument; higher steps give higher quality audio at the expense of slower inference.
5434
* The _length_ of the predicted audio sample can be controlled by varying the `audio_length_in_s` argument.
5535

56-
### How to load and use different schedulers
57-
58-
The AudioLDM pipeline uses [`DDIMScheduler`] scheduler by default. But `diffusers` provides many other schedulers
59-
that can be used with the AudioLDM pipeline such as [`PNDMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`],
60-
[`EulerAncestralDiscreteScheduler`] etc. We recommend using the [`DPMSolverMultistepScheduler`] as it's currently the fastest
61-
scheduler there is.
62-
63-
To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`]
64-
method, or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the
65-
[`DPMSolverMultistepScheduler`], you can do the following:
66-
67-
```python
68-
>>> from diffusers import AudioLDMPipeline, DPMSolverMultistepScheduler
69-
>>> import torch
70-
71-
>>> pipeline = AudioLDMPipeline.from_pretrained("cvssp/audioldm-s-full-v2", torch_dtype=torch.float16)
72-
>>> pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
73-
74-
>>> # or
75-
>>> dpm_scheduler = DPMSolverMultistepScheduler.from_pretrained("cvssp/audioldm-s-full-v2", subfolder="scheduler")
76-
>>> pipeline = AudioLDMPipeline.from_pretrained(
77-
... "cvssp/audioldm-s-full-v2", scheduler=dpm_scheduler, torch_dtype=torch.float16
78-
... )
79-
```
80-
8136
## AudioLDMPipeline
8237
[[autodoc]] AudioLDMPipeline
8338
- all
8439
- __call__
40+
41+
## StableDiffusionPipelineOutput
42+
43+
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

0 commit comments

Comments
 (0)