Skip to content

Commit d471f11

Browse files
committed
Merge remote-tracking branch 'upstream/main' into diffedit-inpainting-pipeline
2 parents 7224532 + 91a2a80 commit d471f11

File tree

22 files changed

+404
-68
lines changed

22 files changed

+404
-68
lines changed

docs/source/en/optimization/habana.mdx

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@ specific language governing permissions and limitations under the License.
1616

1717
## Requirements
1818

19-
- Optimum Habana 1.4 or later, [here](https://huggingface.co/docs/optimum/habana/installation) is how to install it.
20-
- SynapseAI 1.8.
19+
- Optimum Habana 1.5 or later, [here](https://huggingface.co/docs/optimum/habana/installation) is how to install it.
20+
- SynapseAI 1.9.
2121

2222

2323
## Inference Pipeline
@@ -64,7 +64,16 @@ For more information, check out Optimum Habana's [documentation](https://hugging
6464

6565
Here are the latencies for Habana first-generation Gaudi and Gaudi2 with the [Habana/stable-diffusion](https://huggingface.co/Habana/stable-diffusion) Gaudi configuration (mixed precision bf16/fp32):
6666

67+
- [Stable Diffusion v1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) (512x512 resolution):
68+
6769
| | Latency (batch size = 1) | Throughput (batch size = 8) |
6870
| ---------------------- |:------------------------:|:---------------------------:|
69-
| first-generation Gaudi | 4.29s | 0.283 images/s |
70-
| Gaudi2 | 1.54s | 0.904 images/s |
71+
| first-generation Gaudi | 4.22s | 0.29 images/s |
72+
| Gaudi2 | 1.70s | 0.925 images/s |
73+
74+
- [Stable Diffusion v2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1) (768x768 resolution):
75+
76+
| | Latency (batch size = 1) | Throughput |
77+
| ---------------------- |:------------------------:|:-------------------------------:|
78+
| first-generation Gaudi | 23.3s | 0.045 images/s (batch size = 2) |
79+
| Gaudi2 | 7.75s | 0.14 images/s (batch size = 5) |

docs/source/en/training/controlnet.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/ma
7474
wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png
7575
```
7676

77+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument.
7778

7879
```bash
7980
export MODEL_DIR="runwayml/stable-diffusion-v1-5"

docs/source/en/training/custom_diffusion.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ specific language governing permissions and limitations under the License.
1515
[Custom Diffusion](https://arxiv.org/abs/2212.04488) is a method to customize text-to-image models like Stable Diffusion given just a few (4~5) images of a subject.
1616
The `train_custom_diffusion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
1717

18+
This training example was contributed by [Nupur Kumari](https://nupurkmr9.github.io/) (one of the authors of Custom Diffusion).
19+
1820
## Running locally with PyTorch
1921

2022
### Installing the dependencies

docs/source/en/training/dreambooth.mdx

Lines changed: 27 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,20 @@ from accelerate.utils import write_basic_config
5050
write_basic_config()
5151
```
5252

53+
Finally, download a [few images of a dog](https://huggingface.co/datasets/diffusers/dog-example) to DreamBooth with:
54+
55+
```py
56+
from huggingface_hub import snapshot_download
57+
58+
local_dir = "./dog"
59+
snapshot_download(
60+
"diffusers/dog-example",
61+
local_dir=local_dir,
62+
repo_type="dataset",
63+
ignore_patterns=".gitattributes",
64+
)
65+
```
66+
5367
## Finetuning
5468

5569
<Tip warning={true}>
@@ -60,22 +74,13 @@ DreamBooth finetuning is very sensitive to hyperparameters and easy to overfit.
6074

6175
<frameworkcontent>
6276
<pt>
63-
Let's try DreamBooth with a
64-
[few images of a dog](https://huggingface.co/datasets/diffusers/dog-example);
65-
download and save them to a directory and then set the `INSTANCE_DIR` environment variable to that path:
77+
Set the `INSTANCE_DIR` environment variable to the path of the directory containing the dog images.
6678

67-
```python
68-
local_dir = "./path_to_training_images"
69-
snapshot_download(
70-
"diffusers/dog-example",
71-
local_dir=local_dir, repo_type="dataset",
72-
ignore_patterns=".gitattributes",
73-
)
74-
```
79+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument.
7580

7681
```bash
7782
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
78-
export INSTANCE_DIR="path_to_training_images"
83+
export INSTANCE_DIR="./dog"
7984
export OUTPUT_DIR="path_to_saved_model"
8085
```
8186

@@ -105,11 +110,13 @@ Before running the script, make sure you have the requirements installed:
105110
pip install -U -r requirements.txt
106111
```
107112

113+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument.
114+
108115
Now you can launch the training script with the following command:
109116

110117
```bash
111118
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
112-
export INSTANCE_DIR="path-to-instance-images"
119+
export INSTANCE_DIR="./dog"
113120
export OUTPUT_DIR="path-to-save-model"
114121

115122
python train_dreambooth_flax.py \
@@ -135,7 +142,7 @@ The authors recommend generating `num_epochs * num_samples` images for prior pre
135142
<pt>
136143
```bash
137144
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
138-
export INSTANCE_DIR="path_to_training_images"
145+
export INSTANCE_DIR="./dog"
139146
export CLASS_DIR="path_to_class_images"
140147
export OUTPUT_DIR="path_to_saved_model"
141148

@@ -160,7 +167,7 @@ accelerate launch train_dreambooth.py \
160167
<jax>
161168
```bash
162169
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
163-
export INSTANCE_DIR="path-to-instance-images"
170+
export INSTANCE_DIR="./dog"
164171
export CLASS_DIR="path-to-class-images"
165172
export OUTPUT_DIR="path-to-save-model"
166173

@@ -197,7 +204,7 @@ Pass the `--train_text_encoder` argument to the training script to enable finetu
197204
<pt>
198205
```bash
199206
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
200-
export INSTANCE_DIR="path_to_training_images"
207+
export INSTANCE_DIR="./dog"
201208
export CLASS_DIR="path_to_class_images"
202209
export OUTPUT_DIR="path_to_saved_model"
203210

@@ -224,7 +231,7 @@ accelerate launch train_dreambooth.py \
224231
<jax>
225232
```bash
226233
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
227-
export INSTANCE_DIR="path-to-instance-images"
234+
export INSTANCE_DIR="./dog"
228235
export CLASS_DIR="path-to-class-images"
229236
export OUTPUT_DIR="path-to-save-model"
230237

@@ -360,7 +367,7 @@ Then pass the `--use_8bit_adam` option to the training script:
360367

361368
```bash
362369
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
363-
export INSTANCE_DIR="path_to_training_images"
370+
export INSTANCE_DIR="./dog"
364371
export CLASS_DIR="path_to_class_images"
365372
export OUTPUT_DIR="path_to_saved_model"
366373

@@ -389,7 +396,7 @@ To run DreamBooth on a 12GB GPU, you'll need to enable gradient checkpointing, t
389396

390397
```bash
391398
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
392-
export INSTANCE_DIR="path-to-instance-images"
399+
export INSTANCE_DIR="./dog"
393400
export CLASS_DIR="path-to-class-images"
394401
export OUTPUT_DIR="path-to-save-model"
395402

@@ -436,7 +443,7 @@ Launch training with the following command:
436443

437444
```bash
438445
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
439-
export INSTANCE_DIR="path_to_training_images"
446+
export INSTANCE_DIR="./dog"
440447
export CLASS_DIR="path_to_class_images"
441448
export OUTPUT_DIR="path_to_saved_model"
442449

docs/source/en/training/instructpix2pix.mdx

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,7 @@ write_basic_config()
7474
As mentioned before, we'll use a [small toy dataset](https://huggingface.co/datasets/fusing/instructpix2pix-1000-samples) for training. The dataset
7575
is a smaller version of the [original dataset](https://huggingface.co/datasets/timbrooks/instructpix2pix-clip-filtered) used in the InstructPix2Pix paper.
7676

77-
Configure environment variables such as the dataset identifier and the Stable Diffusion
78-
checkpoint:
77+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument. You'll also need to specify the dataset name in `DATASET_ID`:
7978

8079
```bash
8180
export MODEL_NAME="runwayml/stable-diffusion-v1-5"

docs/source/en/training/lora.mdx

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,9 @@ Finetuning a model like Stable Diffusion, which has billions of parameters, can
5252

5353
Let's finetune [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate your own Pokémon.
5454

55-
To start, make sure you have the `MODEL_NAME` and `DATASET_NAME` environment variables set. The `OUTPUT_DIR` and `HUB_MODEL_ID` variables are optional and specify where to save the model to on the Hub:
55+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument. You'll also need to set the `DATASET_NAME` environment variable to the name of the dataset you want to train on.
56+
57+
The `OUTPUT_DIR` and `HUB_MODEL_ID` variables are optional and specify where to save the model to on the Hub:
5658

5759
```bash
5860
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
@@ -140,7 +142,9 @@ Load the LoRA weights from your finetuned model *on top of the base model weight
140142

141143
Let's finetune [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) with DreamBooth and LoRA with some 🐶 [dog images](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ). Download and save these images to a directory.
142144

143-
To start, make sure you have the `MODEL_NAME` and `INSTANCE_DIR` (path to directory containing images) environment variables set. The `OUTPUT_DIR` variables is optional and specifies where to save the model to on the Hub:
145+
To start, specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument. You'll also need to set `INSTANCE_DIR` to the path of the directory containing the images.
146+
147+
The `OUTPUT_DIR` variables is optional and specifies where to save the model to on the Hub:
144148

145149
```bash
146150
export MODEL_NAME="runwayml/stable-diffusion-v1-5"

docs/source/en/training/text2image.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,9 @@ To load a checkpoint to resume training, pass the argument `--resume_from_checkp
7272

7373
<frameworkcontent>
7474
<pt>
75-
Launch the [PyTorch training script](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) for a fine-tuning run on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset like this:
75+
Launch the [PyTorch training script](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) for a fine-tuning run on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset like this.
76+
77+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument.
7678

7779
<literalinclude>
7880
{"path": "../../../../examples/text_to_image/README.md",
@@ -141,6 +143,8 @@ Before running the script, make sure you have the requirements installed:
141143
pip install -U -r requirements_flax.txt
142144
```
143145

146+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument.
147+
144148
Now you can launch the [Flax training script](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_flax.py) like this:
145149

146150
```bash

docs/source/en/training/text_inversion.mdx

Lines changed: 30 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
1+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
22

33
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
44
the License. You may obtain a copy of the License at
@@ -81,9 +81,20 @@ To resume training from a saved checkpoint, pass the following argument to the t
8181

8282
## Finetuning
8383

84-
For your training dataset, download these [images of a cat statue](https://drive.google.com/drive/folders/1fmJMs25nxS_rSNqS5hTcRdLem_YQXbq5) and store them in a directory.
84+
For your training dataset, download these [images of a cat toy](https://huggingface.co/datasets/diffusers/cat_toy_example) and store them in a directory:
8585

86-
Set the `MODEL_NAME` environment variable to the model repository id, and the `DATA_DIR` environment variable to the path of the directory containing the images. Now you can launch the [training script](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion.py):
86+
```py
87+
from huggingface_hub import snapshot_download
88+
89+
local_dir = "./cat"
90+
snapshot_download(
91+
"diffusers/cat_toy_example", local_dir=local_dir, repo_type="dataset", ignore_patterns=".gitattributes"
92+
)
93+
```
94+
95+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument, and the `DATA_DIR` environment variable to the path of the directory containing the images.
96+
97+
Now you can launch the [training script](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion.py):
8798

8899
<Tip>
89100

@@ -95,7 +106,7 @@ Set the `MODEL_NAME` environment variable to the model repository id, and the `D
95106
<pt>
96107
```bash
97108
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
98-
export DATA_DIR="path-to-dir-containing-images"
109+
export DATA_DIR="./cat"
99110

100111
accelerate launch textual_inversion.py \
101112
--pretrained_model_name_or_path=$MODEL_NAME \
@@ -111,6 +122,18 @@ accelerate launch textual_inversion.py \
111122
--lr_warmup_steps=0 \
112123
--output_dir="textual_inversion_cat"
113124
```
125+
126+
<Tip>
127+
128+
💡 If you want to increase the trainable capacity, you can associate your placeholder token, *e.g.* `<cat-toy>` to
129+
multiple embedding vectors. This can help the model to better capture the style of more (complex) images.
130+
To enable training multiple embedding vectors, simply pass:
131+
132+
```bash
133+
--num_vectors=5
134+
```
135+
136+
</Tip>
114137
</pt>
115138
<jax>
116139
If you have access to TPUs, try out the [Flax training script](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion_flax.py) to train even faster (this'll also work for GPUs). With the same configuration settings, the Flax training script should be at least 70% faster than the PyTorch training script! ⚡️
@@ -121,11 +144,13 @@ Before you begin, make sure you install the Flax specific dependencies:
121144
pip install -U -r requirements_flax.txt
122145
```
123146

147+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument.
148+
124149
Then you can launch the [training script](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion_flax.py):
125150

126151
```bash
127152
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
128-
export DATA_DIR="path-to-dir-containing-images"
153+
export DATA_DIR="./cat"
129154

130155
python textual_inversion_flax.py \
131156
--pretrained_model_name_or_path=$MODEL_NAME \

docs/source/en/using-diffusers/reproducibility.mdx

Lines changed: 44 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
1414

1515
Reproducibility is important for testing, replicating results, and can even be used to [improve image quality](reusing_seeds). However, the randomness in diffusion models is a desired property because it allows the pipeline to generate different images every time it is run. While you can't expect to get the exact same results across platforms, you can expect results to be reproducible across releases and platforms within a certain tolerance range. Even then, tolerance varies depending on the diffusion pipeline and checkpoint.
1616

17-
This is why it's important to understand how to control sources of randomness in diffusion models.
17+
This is why it's important to understand how to control sources of randomness in diffusion models or use deterministic algorithms.
1818

1919
<Tip>
2020

@@ -24,7 +24,7 @@ This is why it's important to understand how to control sources of randomness in
2424

2525
</Tip>
2626

27-
## Inference
27+
## Control randomness
2828

2929
During inference, pipelines rely heavily on random sampling operations which include creating the
3030
Gaussian noise tensors to denoise and adding noise to the scheduling step.
@@ -147,5 +147,46 @@ susceptible to precision error propagation. Don't expect similar results across
147147
different GPU hardware or PyTorch versions. In this case, you'll need to run
148148
exactly the same hardware and PyTorch version for full reproducibility.
149149

150-
## randn_tensor
150+
### randn_tensor
151151
[[autodoc]] diffusers.utils.randn_tensor
152+
153+
## Deterministic algorithms
154+
155+
You can also configure PyTorch to use deterministic algorithms to create a reproducible pipeline. However, you should be aware that deterministic algorithms may be slower than nondeterministic ones and you may observe a decrease in performance. But if reproducibility is important to you, then this is the way to go!
156+
157+
Nondeterministic behavior occurs when operations are launched in more than one CUDA stream. To avoid this, set the environment varibale [`CUBLAS_WORKSPACE_CONFIG`](https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility) to `:16:8` to only use one buffer size during runtime.
158+
159+
PyTorch typically benchmarks multiple algorithms to select the fastest one, but if you want reproducibility, you should disable this feature because the benchmark may select different algorithms each time. Lastly, pass `True` to [`torch.use_deterministic_algorithms`](https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html) to enable deterministic algorithms.
160+
161+
```py
162+
import os
163+
164+
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":16:8"
165+
166+
torch.backends.cudnn.benchmark = False
167+
torch.use_deterministic_algorithms(True)
168+
```
169+
170+
Now when you run the same pipeline twice, you'll get identical results.
171+
172+
```py
173+
import torch
174+
from diffusers import DDIMScheduler, StableDiffusionPipeline
175+
import numpy as np
176+
177+
model_id = "runwayml/stable-diffusion-v1-5"
178+
pipe = StableDiffusionPipeline.from_pretrained(model_id).to("cuda")
179+
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
180+
g = torch.Generator(device="cuda")
181+
182+
prompt = "A bear is playing a guitar on Times Square"
183+
184+
g.manual_seed(0)
185+
result1 = pipe(prompt=prompt, num_inference_steps=50, generator=g, output_type="latent").images
186+
187+
g.manual_seed(0)
188+
result2 = pipe(prompt=prompt, num_inference_steps=50, generator=g, output_type="latent").images
189+
190+
print("L_inf dist = ", abs(result1 - result2).max())
191+
"L_inf dist = tensor(0., device='cuda:0')"
192+
```

examples/community/stable_diffusion_tensorrt_txt2img.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -703,7 +703,7 @@ def set_cached_folder(cls, pretrained_model_name_or_path: Optional[Union[str, os
703703
)
704704

705705
def to(self, torch_device: Optional[Union[str, torch.device]] = None, silence_dtype_warnings: bool = False):
706-
super().to(torch_device, silence_dtype_warnings)
706+
super().to(torch_device, silence_dtype_warnings=silence_dtype_warnings)
707707

708708
self.onnx_dir = os.path.join(self.cached_folder, self.onnx_dir)
709709
self.engine_dir = os.path.join(self.cached_folder, self.engine_dir)

0 commit comments

Comments
 (0)