Skip to content

Commit 1f0530f

Browse files
authored
Merge pull request #6 from Pseudo-Lab/v0.15
V0.15버전 PR 요청드립니다.
2 parents 9fb0466 + e753454 commit 1f0530f

File tree

300 files changed

+15816
-3166
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

300 files changed

+15816
-3166
lines changed

.github/workflows/pr_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ jobs:
4040
framework: pytorch_examples
4141
runner: docker-cpu
4242
image: diffusers/diffusers-pytorch-cpu
43-
report: torch_cpu
43+
report: torch_example_cpu
4444

4545
name: ${{ matrix.config.name }}
4646

.github/workflows/push_tests_fast.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ jobs:
3838
framework: pytorch_examples
3939
runner: docker-cpu
4040
image: diffusers/diffusers-pytorch-cpu
41-
report: torch_cpu
41+
report: torch_example_cpu
4242

4343
name: ${{ matrix.config.name }}
4444

CONTRIBUTING.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -394,8 +394,15 @@ passes. You should run the tests impacted by your changes like this:
394394
```bash
395395
$ pytest tests/<TEST_TO_RUN>.py
396396
```
397+
398+
Before you run the tests, please make sure you install the dependencies required for testing. You can do so
399+
with this command:
397400

398-
You can also run the full suite with the following command, but it takes
401+
```bash
402+
$ pip install -e ".[test]"
403+
```
404+
405+
You can run the full test suite with the following command, but it takes
399406
a beefy machine to produce a result in a decent amount of time now that
400407
Diffusers has grown a lot. Here is the command for it:
401408

@@ -439,7 +446,7 @@ Push the changes to your account using:
439446
$ git push -u origin a-descriptive-name-for-my-changes
440447
```
441448

442-
6. Once you are satisfied (**and the checklist below is happy too**), go to the
449+
6. Once you are satisfied, go to the
443450
webpage of your fork on GitHub. Click on 'Pull request' to send your changes
444451
to the project maintainers for review.
445452

docs/source/en/_toctree.yml

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
- local: quicktour
55
title: Quicktour
66
- local: stable_diffusion
7-
title: Stable Diffusion
7+
title: Effective and efficient diffusion
88
- local: installation
99
title: Installation
1010
title: Get started
@@ -33,15 +33,15 @@
3333
- local: using-diffusers/pipeline_overview
3434
title: Overview
3535
- local: using-diffusers/unconditional_image_generation
36-
title: Unconditional Image Generation
36+
title: Unconditional image generation
3737
- local: using-diffusers/conditional_image_generation
38-
title: Text-to-Image Generation
38+
title: Text-to-image generation
3939
- local: using-diffusers/img2img
40-
title: Text-Guided Image-to-Image
40+
title: Text-guided image-to-image
4141
- local: using-diffusers/inpaint
42-
title: Text-Guided Image-Inpainting
42+
title: Text-guided image-inpainting
4343
- local: using-diffusers/depth2img
44-
title: Text-Guided Depth-to-Image
44+
title: Text-guided depth-to-image
4545
- local: using-diffusers/reusing_seeds
4646
title: Improve image quality with deterministic generation
4747
- local: using-diffusers/reproducibility
@@ -52,6 +52,8 @@
5252
title: How to contribute a Pipeline
5353
- local: using-diffusers/using_safetensors
5454
title: Using safetensors
55+
- local: using-diffusers/stable_diffusion_jax_how_to
56+
title: Stable Diffusion in JAX/Flax
5557
- local: using-diffusers/weighted_prompts
5658
title: Weighting Prompts
5759
title: Pipelines for Inference
@@ -95,6 +97,8 @@
9597
title: ONNX
9698
- local: optimization/open_vino
9799
title: OpenVINO
100+
- local: optimization/coreml
101+
title: Core ML
98102
- local: optimization/mps
99103
title: MPS
100104
- local: optimization/habana
@@ -134,6 +138,8 @@
134138
title: AltDiffusion
135139
- local: api/pipelines/audio_diffusion
136140
title: Audio Diffusion
141+
- local: api/pipelines/audioldm
142+
title: AudioLDM
137143
- local: api/pipelines/cycle_diffusion
138144
title: Cycle Diffusion
139145
- local: api/pipelines/dance_diffusion
@@ -158,6 +164,8 @@
158164
title: Score SDE VE
159165
- local: api/pipelines/semantic_stable_diffusion
160166
title: Semantic Guidance
167+
- local: api/pipelines/spectrogram_diffusion
168+
title: "Spectrogram Diffusion"
161169
- sections:
162170
- local: api/pipelines/stable_diffusion/overview
163171
title: Overview
@@ -187,6 +195,8 @@
187195
title: MultiDiffusion Panorama
188196
- local: api/pipelines/stable_diffusion/controlnet
189197
title: Text-to-Image Generation with ControlNet Conditioning
198+
- local: api/pipelines/stable_diffusion/model_editing
199+
title: Text-to-Image Model Editing
190200
title: Stable Diffusion
191201
- local: api/pipelines/stable_diffusion_2
192202
title: Stable Diffusion 2
@@ -196,6 +206,8 @@
196206
title: Stochastic Karras VE
197207
- local: api/pipelines/text_to_video
198208
title: Text-to-Video
209+
- local: api/pipelines/text_to_video_zero
210+
title: Text-to-Video Zero
199211
- local: api/pipelines/unclip
200212
title: UnCLIP
201213
- local: api/pipelines/latent_diffusion_uncond

docs/source/en/api/loaders.mdx

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,11 @@ API to load such adapter neural networks via the [`loaders.py` module](https://g
2828
### UNet2DConditionLoadersMixin
2929

3030
[[autodoc]] loaders.UNet2DConditionLoadersMixin
31+
32+
### TextualInversionLoaderMixin
33+
34+
[[autodoc]] loaders.TextualInversionLoaderMixin
35+
36+
### LoraLoaderMixin
37+
38+
[[autodoc]] loaders.LoraLoaderMixin

docs/source/en/api/models.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,3 +99,9 @@ The models are built on the base class ['ModelMixin'] that is a `torch.nn.module
9999

100100
## FlaxAutoencoderKL
101101
[[autodoc]] FlaxAutoencoderKL
102+
103+
## FlaxControlNetOutput
104+
[[autodoc]] models.controlnet_flax.FlaxControlNetOutput
105+
106+
## FlaxControlNetModel
107+
[[autodoc]] FlaxControlNetModel

docs/source/en/api/pipelines/alt_diffusion.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
1212

1313
# AltDiffusion
1414

15-
AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu
15+
AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu.
1616

1717
The abstract of the paper is the following:
1818

@@ -28,11 +28,11 @@ The abstract of the paper is the following:
2828

2929
## Tips
3030

31-
- AltDiffusion is conceptually exaclty the same as [Stable Diffusion](./api/pipelines/stable_diffusion/overview).
31+
- AltDiffusion is conceptually exactly the same as [Stable Diffusion](./stable_diffusion/overview).
3232

3333
- *Run AltDiffusion*
3434

35-
AltDiffusion can be tested very easily with the [`AltDiffusionPipeline`], [`AltDiffusionImg2ImgPipeline`] and the `"BAAI/AltDiffusion-m9"` checkpoint exactly in the same way it is shown in the [Conditional Image Generation Guide](./using-diffusers/conditional_image_generation) and the [Image-to-Image Generation Guide](./using-diffusers/img2img).
35+
AltDiffusion can be tested very easily with the [`AltDiffusionPipeline`], [`AltDiffusionImg2ImgPipeline`] and the `"BAAI/AltDiffusion-m9"` checkpoint exactly in the same way it is shown in the [Conditional Image Generation Guide](../../using-diffusers/conditional_image_generation) and the [Image-to-Image Generation Guide](../../using-diffusers/img2img).
3636

3737
- *How to load and use different schedulers.*
3838

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# AudioLDM
14+
15+
## Overview
16+
17+
AudioLDM was proposed in [AudioLDM: Text-to-Audio Generation with Latent Diffusion Models](https://arxiv.org/abs/2301.12503) by Haohe Liu et al.
18+
19+
Inspired by [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview), AudioLDM
20+
is a text-to-audio _latent diffusion model (LDM)_ that learns continuous audio representations from [CLAP](https://huggingface.co/docs/transformers/main/model_doc/clap)
21+
latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional
22+
sound effects, human speech and music.
23+
24+
This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi). The original codebase can be found [here](https://github.com/haoheliu/AudioLDM).
25+
26+
## Text-to-Audio
27+
28+
The [`AudioLDMPipeline`] can be used to load pre-trained weights from [cvssp/audioldm](https://huggingface.co/cvssp/audioldm) and generate text-conditional audio outputs:
29+
30+
```python
31+
from diffusers import AudioLDMPipeline
32+
import torch
33+
import scipy
34+
35+
repo_id = "cvssp/audioldm"
36+
pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
37+
pipe = pipe.to("cuda")
38+
39+
prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
40+
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]
41+
42+
# save the audio sample as a .wav file
43+
scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)
44+
```
45+
46+
### Tips
47+
48+
Prompts:
49+
* Descriptive prompt inputs work best: you can use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g., "water stream in a forest" instead of "stream").
50+
* It's best to use general terms like 'cat' or 'dog' instead of specific names or abstract objects that the model may not be familiar with.
51+
52+
Inference:
53+
* The _quality_ of the predicted audio sample can be controlled by the `num_inference_steps` argument: higher steps give higher quality audio at the expense of slower inference.
54+
* The _length_ of the predicted audio sample can be controlled by varying the `audio_length_in_s` argument.
55+
56+
### How to load and use different schedulers
57+
58+
The AudioLDM pipeline uses [`DDIMScheduler`] scheduler by default. But `diffusers` provides many other schedulers
59+
that can be used with the AudioLDM pipeline such as [`PNDMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`],
60+
[`EulerAncestralDiscreteScheduler`] etc. We recommend using the [`DPMSolverMultistepScheduler`] as it's currently the fastest
61+
scheduler there is.
62+
63+
To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`]
64+
method, or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the
65+
[`DPMSolverMultistepScheduler`], you can do the following:
66+
67+
```python
68+
>>> from diffusers import AudioLDMPipeline, DPMSolverMultistepScheduler
69+
>>> import torch
70+
71+
>>> pipeline = AudioLDMPipeline.from_pretrained("cvssp/audioldm", torch_dtype=torch.float16)
72+
>>> pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
73+
74+
>>> # or
75+
>>> dpm_scheduler = DPMSolverMultistepScheduler.from_pretrained("cvssp/audioldm", subfolder="scheduler")
76+
>>> pipeline = AudioLDMPipeline.from_pretrained("cvssp/audioldm", scheduler=dpm_scheduler, torch_dtype=torch.float16)
77+
```
78+
79+
## AudioLDMPipeline
80+
[[autodoc]] AudioLDMPipeline
81+
- all
82+
- __call__

docs/source/en/api/pipelines/overview.mdx

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ components - all of which are needed to have a functioning end-to-end diffusion
1919
As an example, [Stable Diffusion](https://huggingface.co/blog/stable_diffusion) has three independently trained models:
2020
- [Autoencoder](./api/models#vae)
2121
- [Conditional Unet](./api/models#UNet2DConditionModel)
22-
- [CLIP text encoder](https://huggingface.co/docs/transformers/v4.21.2/en/model_doc/clip#transformers.CLIPTextModel)
22+
- [CLIP text encoder](https://huggingface.co/docs/transformers/v4.27.1/en/model_doc/clip#transformers.CLIPTextModel)
2323
- a scheduler component, [scheduler](./api/scheduler#pndm),
24-
- a [CLIPFeatureExtractor](https://huggingface.co/docs/transformers/v4.21.2/en/model_doc/clip#transformers.CLIPFeatureExtractor),
24+
- a [CLIPImageProcessor](https://huggingface.co/docs/transformers/v4.27.1/en/model_doc/clip#transformers.CLIPImageProcessor),
2525
- as well as a [safety checker](./stable_diffusion#safety_checker).
2626
All of these components are necessary to run stable diffusion in inference even though they were trained
2727
or created independently from each other.
@@ -83,6 +83,7 @@ available a colab notebook to directly try them out.
8383
| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation |
8484
| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation |
8585
| [vq_diffusion](./vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation |
86+
| [text_to_video_zero](./text_to_video_zero) | [Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators](https://arxiv.org/abs/2303.13439) | Text-to-Video Generation |
8687

8788

8889
**Note**: Pipelines are simple examples of how to play around with the diffusion systems as described in the corresponding papers.
@@ -108,7 +109,7 @@ from the local path.
108109
each pipeline, one should look directly into the respective pipeline.
109110

110111
**Note**: All pipelines have PyTorch's autograd disabled by decorating the `__call__` method with a [`torch.no_grad`](https://pytorch.org/docs/stable/generated/torch.no_grad.html) decorator because pipelines should
111-
not be used for training. If you want to store the gradients during the forward pass, we recommend writing your own pipeline, see also our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/community)
112+
not be used for training. If you want to store the gradients during the forward pass, we recommend writing your own pipeline, see also our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/community).
112113

113114
## Contribution
114115

@@ -173,7 +174,7 @@ You can also run this example on colab [![Open In Colab](https://colab.research.
173174

174175
### Tweak prompts reusing seeds and latents
175176

176-
You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](https://github.com/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb).
177+
You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](https://github.com/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb)
177178

178179

179180
### In-painting using Stable Diffusion

docs/source/en/api/pipelines/paint_by_example.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
1414

1515
## Overview
1616

17-
[Paint by Example: Exemplar-based Image Editing with Diffusion Models](https://arxiv.org/abs/2211.13227) by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen
17+
[Paint by Example: Exemplar-based Image Editing with Diffusion Models](https://arxiv.org/abs/2211.13227) by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen.
1818

1919
The abstract of the paper is the following:
2020

0 commit comments

Comments
 (0)