Skip to content

Commit 4e67675

Browse files
committed
Merge branch 'main' of https://github.com/huggingface/diffusers into stable_diff_opti
2 parents e422eb3 + 0c0c222 commit 4e67675

File tree

119 files changed

+7014
-899
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

119 files changed

+7014
-899
lines changed

.github/workflows/stale.yml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: Stale Bot
2+
3+
on:
4+
schedule:
5+
- cron: "0 15 * * *"
6+
7+
jobs:
8+
close_stale_issues:
9+
name: Close Stale Issues
10+
if: github.repository == 'huggingface/diffusers'
11+
runs-on: ubuntu-latest
12+
env:
13+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
14+
steps:
15+
- uses: actions/checkout@v2
16+
17+
- name: Setup Python
18+
uses: actions/setup-python@v1
19+
with:
20+
python-version: 3.7
21+
22+
- name: Install requirements
23+
run: |
24+
pip install PyGithub
25+
- name: Close stale issues
26+
run: |
27+
python utils/stale.py

.github/workflows/typos.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
name: Check typos
2+
3+
on:
4+
workflow_dispatch:
5+
6+
jobs:
7+
build:
8+
runs-on: ubuntu-latest
9+
10+
steps:
11+
- uses: actions/checkout@v3
12+
13+
- name: typos-action
14+
uses: crate-ci/[email protected]

README.md

Lines changed: 24 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ as a modular toolbox for inference and training of diffusion models.
2121
More precisely, 🤗 Diffusers offers:
2222

2323
- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)). Check [this overview](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/README.md#pipelines-summary) to see all supported pipelines and their corresponding official papers.
24-
- Various noise schedulers that can be used interchangeably for the prefered speed vs. quality trade-off in inference (see [src/diffusers/schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)).
24+
- Various noise schedulers that can be used interchangeably for the preferred speed vs. quality trade-off in inference (see [src/diffusers/schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)).
2525
- Multiple types of models, such as UNet, can be used as building blocks in an end-to-end diffusion system (see [src/diffusers/models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)).
2626
- Training examples to show how to train the most popular diffusion model tasks (see [examples](https://github.com/huggingface/diffusers/tree/main/examples), *e.g.* [unconditional-image-generation](https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation)).
2727

@@ -30,7 +30,7 @@ More precisely, 🤗 Diffusers offers:
3030
**With `pip`**
3131

3232
```bash
33-
pip install --upgrade diffusers # should install diffusers 0.2.4
33+
pip install --upgrade diffusers
3434
```
3535

3636
**With `conda`**
@@ -39,6 +39,10 @@ pip install --upgrade diffusers # should install diffusers 0.2.4
3939
conda install -c conda-forge diffusers
4040
```
4141

42+
**Apple Silicon (M1/M2) support**
43+
44+
Please, refer to [the documentation](https://huggingface.co/docs/diffusers/optimization/mps).
45+
4246
## Contributing
4347

4448
We ❤️ contributions from the open-source community!
@@ -191,7 +195,7 @@ with autocast("cuda"):
191195

192196
images[0].save("fantasy_landscape.png")
193197
```
194-
You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb)
198+
You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)
195199

196200
### In-painting using Stable Diffusion
197201

@@ -254,42 +258,49 @@ If you want to run the code yourself 💻, you can try out:
254258
- [Text-to-Image Latent Diffusion](https://huggingface.co/CompVis/ldm-text2im-large-256)
255259
```python
256260
# !pip install diffusers transformers
261+
from torch import autocast
257262
from diffusers import DiffusionPipeline
258263

264+
device = "cuda"
259265
model_id = "CompVis/ldm-text2im-large-256"
260266

261267
# load model and scheduler
262268
ldm = DiffusionPipeline.from_pretrained(model_id)
269+
ldm = ldm.to(device)
263270

264271
# run pipeline in inference (sample random noise and denoise)
265272
prompt = "A painting of a squirrel eating a burger"
266-
images = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6).images
273+
with autocast(device):
274+
image = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6).images[0]
267275

268-
# save images
269-
for idx, image in enumerate(images):
270-
image.save(f"squirrel-{idx}.png")
276+
# save image
277+
image.save("squirrel.png")
271278
```
272279
- [Unconditional Diffusion with discrete scheduler](https://huggingface.co/google/ddpm-celebahq-256)
273280
```python
274281
# !pip install diffusers
282+
from torch import autocast
275283
from diffusers import DDPMPipeline, DDIMPipeline, PNDMPipeline
276284

277285
model_id = "google/ddpm-celebahq-256"
286+
device = "cuda"
278287

279288
# load model and scheduler
280289
ddpm = DDPMPipeline.from_pretrained(model_id) # you can replace DDPMPipeline with DDIMPipeline or PNDMPipeline for faster inference
290+
ddpm.to(device)
281291

282292
# run pipeline in inference (sample random noise and denoise)
283-
image = ddpm().images
293+
with autocast("cuda"):
294+
image = ddpm().images[0]
284295

285296
# save image
286-
image[0].save("ddpm_generated_image.png")
297+
image.save("ddpm_generated_image.png")
287298
```
288299
- [Unconditional Latent Diffusion](https://huggingface.co/CompVis/ldm-celebahq-256)
289-
- [Unconditional Diffusion with continous scheduler](https://huggingface.co/google/ncsnpp-ffhq-1024)
300+
- [Unconditional Diffusion with continuous scheduler](https://huggingface.co/google/ncsnpp-ffhq-1024)
290301

291302
**Other Notebooks**:
292-
* [image-to-image generation with Stable Diffusion](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg),
303+
* [image-to-image generation with Stable Diffusion](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg),
293304
* [tweak images via repeated Stable Diffusion seeds](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg),
294305

295306
### Web Demos
@@ -335,8 +346,8 @@ The class provides functionality to compute previous image according to alpha, b
335346

336347
## Philosophy
337348

338-
- Readability and clarity is prefered over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. *E.g.*, the provided [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) are separated from the provided [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and provide well-commented code that can be read alongside the original paper.
339-
- Diffusers is **modality independent** and focuses on providing pretrained models and tools to build systems that generate **continous outputs**, *e.g.* vision and audio.
349+
- Readability and clarity is preferred over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. *E.g.*, the provided [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) are separated from the provided [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and provide well-commented code that can be read alongside the original paper.
350+
- Diffusers is **modality independent** and focuses on providing pretrained models and tools to build systems that generate **continuous outputs**, *e.g.* vision and audio.
340351
- Diffusion models and schedulers are provided as concise, elementary building blocks. In contrast, diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of another library, such as text-encoders. Examples for diffusion pipelines are [Glide](https://github.com/openai/glide-text2im) and [Latent Diffusion](https://github.com/CompVis/latent-diffusion).
341352

342353
## In the works

_typos.toml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Files for typos
2+
# Instruction: https://github.com/marketplace/actions/typos-action#getting-started
3+
4+
[default.extend-identifiers]
5+
6+
[default.extend-words]
7+
NIN_="NIN" # NIN is used in scripts/convert_ncsnpp_original_checkpoint_to_diffusers.py
8+
nd="np" # nd may be np (numpy)
9+
10+
11+
[files]
12+
extend-exclude = ["_typos.toml"]

docs/source/_toctree.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,6 @@
3535
title: "Open Vino"
3636
- local: optimization/mps
3737
title: "MPS"
38-
- local: optimization/other
39-
title: "Other"
4038
title: "Optimization/Special Hardware"
4139
- sections:
4240
- local: training/overview

docs/source/api/configuration.mdx

Lines changed: 8 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -10,19 +10,14 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1010
specific language governing permissions and limitations under the License.
1111
-->
1212

13-
# Models
13+
# Configuration
1414

15-
Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
16-
The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$.
17-
The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
15+
In Diffusers, schedulers of type [`schedulers.scheduling_utils.SchedulerMixin`], and models of type [`ModelMixin`] inherit from [`ConfigMixin`] which conveniently takes care of storing all parameters that are
16+
passed to the respective `__init__` methods in a JSON-configuration file.
1817

19-
## API
18+
TODO(PVP) - add example and better info here
2019

21-
Models should provide the `def forward` function and initialization of the model.
22-
All saving, loading, and utilities should be in the base ['ModelMixin'] class.
23-
24-
## Examples
25-
26-
- The ['UNetModel'] was proposed in [TODO](https://arxiv.org/) and has been used in paper1, paper2, paper3.
27-
- Extensions of the ['UNetModel'] include the ['UNetGlideModel'] that uses attention and timestep embeddings for the [GLIDE](https://arxiv.org/abs/2112.10741) paper, the ['UNetGradTTS'] model from this [paper](https://arxiv.org/abs/2105.06337) for text-to-speech, ['UNetLDMModel'] for latent-diffusion models in this [paper](https://arxiv.org/abs/2112.10752), and the ['TemporalUNet'] used for time-series prediciton in this reinforcement learning [paper](https://arxiv.org/abs/2205.09991).
28-
- TODO: mention VAE / SDE score estimation
20+
## ConfigMixin
21+
[[autodoc]] ConfigMixin
22+
- from_config
23+
- save_config

docs/source/api/diffusion_pipeline.mdx

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,19 +10,30 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1010
specific language governing permissions and limitations under the License.
1111
-->
1212

13-
# Models
13+
# Pipelines
1414

15-
Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
16-
The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$.
17-
The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
15+
The [`DiffusionPipeline`] is the easiest way to load any pretrained diffusion pipeline from the [Hub](https://huggingface.co/models?library=diffusers) and to use it in inference.
1816

19-
## API
17+
<Tip>
18+
19+
One should not use the Diffusion Pipeline class for training or fine-tuning a diffusion model. Individual
20+
components of diffusion pipelines are usually trained individually, so we suggest to directly work
21+
with [`UNetModel`] and [`UNetConditionModel`].
2022

21-
Models should provide the `def forward` function and initialization of the model.
22-
All saving, loading, and utilities should be in the base ['ModelMixin'] class.
23+
</Tip>
2324

24-
## Examples
25+
Any diffusion pipeline that is loaded with [`~DiffusionPipeline.from_pretrained`] will automatically
26+
detect the pipeline type, *e.g.* [`StableDiffusionPipeline`] and consequently load each component of the
27+
pipeline and pass them into the `__init__` function of the pipeline, *e.g.* [`~StableDiffusionPipeline.__init__`].
2528

26-
- The ['UNetModel'] was proposed in [TODO](https://arxiv.org/) and has been used in paper1, paper2, paper3.
27-
- Extensions of the ['UNetModel'] include the ['UNetGlideModel'] that uses attention and timestep embeddings for the [GLIDE](https://arxiv.org/abs/2112.10741) paper, the ['UNetGradTTS'] model from this [paper](https://arxiv.org/abs/2105.06337) for text-to-speech, ['UNetLDMModel'] for latent-diffusion models in this [paper](https://arxiv.org/abs/2112.10752), and the ['TemporalUNet'] used for time-series prediciton in this reinforcement learning [paper](https://arxiv.org/abs/2205.09991).
28-
- TODO: mention VAE / SDE score estimation
29+
Any pipeline object can be saved locally with [`~DiffusionPipeline.save_pretrained`].
30+
31+
## DiffusionPipeline
32+
[[autodoc]] DiffusionPipeline
33+
- from_pretrained
34+
- save_pretrained
35+
36+
## ImagePipelineOutput
37+
By default diffusion pipelines return an object of class
38+
39+
[[autodoc]] pipeline_utils.ImagePipelineOutput

docs/source/api/models.mdx

Lines changed: 26 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,32 @@ Diffusers contains pretrained models for popular algorithms and modules for crea
1616
The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$.
1717
The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
1818

19-
## API
19+
## ModelMixin
20+
[[autodoc]] ModelMixin
2021

21-
Models should provide the `def forward` function and initialization of the model.
22-
All saving, loading, and utilities should be in the base ['ModelMixin'] class.
22+
## UNet2DOutput
23+
[[autodoc]] models.unet_2d.UNet2DOutput
2324

24-
## Examples
25+
## UNet2DModel
26+
[[autodoc]] UNet2DModel
2527

26-
- The ['UNetModel'] was proposed in [TODO](https://arxiv.org/) and has been used in paper1, paper2, paper3.
27-
- Extensions of the ['UNetModel'] include the ['UNetGlideModel'] that uses attention and timestep embeddings for the [GLIDE](https://arxiv.org/abs/2112.10741) paper, the ['UNetGradTTS'] model from this [paper](https://arxiv.org/abs/2105.06337) for text-to-speech, ['UNetLDMModel'] for latent-diffusion models in this [paper](https://arxiv.org/abs/2112.10752), and the ['TemporalUNet'] used for time-series prediciton in this reinforcement learning [paper](https://arxiv.org/abs/2205.09991).
28-
- TODO: mention VAE / SDE score estimation
28+
## UNet2DConditionOutput
29+
[[autodoc]] models.unet_2d_condition.UNet2DConditionOutput
30+
31+
## UNet2DConditionModel
32+
[[autodoc]] UNet2DConditionModel
33+
34+
## DecoderOutput
35+
[[autodoc]] models.vae.DecoderOutput
36+
37+
## VQEncoderOutput
38+
[[autodoc]] models.vae.VQEncoderOutput
39+
40+
## VQModel
41+
[[autodoc]] VQModel
42+
43+
## AutoencoderKLOutput
44+
[[autodoc]] models.vae.AutoencoderKLOutput
45+
46+
## AutoencoderKL
47+
[[autodoc]] AutoencoderKL

docs/source/api/outputs.mdx

Lines changed: 38 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,19 +10,46 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1010
specific language governing permissions and limitations under the License.
1111
-->
1212

13-
# Models
13+
# BaseOutputs
1414

15-
Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
16-
The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$.
17-
The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
15+
All models have outputs that are instances of subclasses of [`~utils.BaseOutput`]. Those are
16+
data structures containing all the information returned by the model, but that can also be used as tuples or
17+
dictionaries.
1818

19-
## API
19+
Let's see how this looks in an example:
2020

21-
Models should provide the `def forward` function and initialization of the model.
22-
All saving, loading, and utilities should be in the base ['ModelMixin'] class.
21+
```python
22+
from diffusers import DDIMPipeline
2323

24-
## Examples
24+
pipeline = DDIMPipeline.from_pretrained("google/ddpm-cifar10-32")
25+
outputs = pipeline()
26+
```
2527

26-
- The ['UNetModel'] was proposed in [TODO](https://arxiv.org/) and has been used in paper1, paper2, paper3.
27-
- Extensions of the ['UNetModel'] include the ['UNetGlideModel'] that uses attention and timestep embeddings for the [GLIDE](https://arxiv.org/abs/2112.10741) paper, the ['UNetGradTTS'] model from this [paper](https://arxiv.org/abs/2105.06337) for text-to-speech, ['UNetLDMModel'] for latent-diffusion models in this [paper](https://arxiv.org/abs/2112.10752), and the ['TemporalUNet'] used for time-series prediciton in this reinforcement learning [paper](https://arxiv.org/abs/2205.09991).
28-
- TODO: mention VAE / SDE score estimation
28+
The `outputs` object is a [`~pipeline_utils.ImagePipelineOutput`], as we can see in the
29+
documentation of that class below, it means it has an image attribute.
30+
31+
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you will get `None`:
32+
33+
```python
34+
outputs.images
35+
```
36+
37+
or via keyword lookup
38+
39+
```python
40+
outputs["images"]
41+
```
42+
43+
When considering our `outputs` object as tuple, it only considers the attributes that don't have `None` values.
44+
Here for instance, we could retrieve images via indexing:
45+
46+
```python
47+
outputs[:1]
48+
```
49+
50+
which will return the tuple `(outputs.images)` for instance.
51+
52+
## BaseOutput
53+
54+
[[autodoc]] utils.BaseOutput
55+
- to_tuple

docs/source/api/pipelines/ddim.mdx

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ The original codebase of this paper can be found [here](https://github.com/ermon
1717
| [pipeline_ddim.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddim/pipeline_ddim.py) | *Unconditional Image Generation* | - |
1818

1919

20-
## API
21-
22-
[[autodoc]] pipelines.ddim.pipeline_ddim.DDIMPipeline
20+
## DDIMPipeline
21+
[[autodoc]] DDIMPipeline
2322
- __call__

0 commit comments

Comments
 (0)