Skip to content

Commit baafe02

Browse files
committed
Merge remote-tracking branch 'upstream/main' into diffedit-inpainting-pipeline
2 parents c28a3f6 + 3045fb2 commit baafe02

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+5417
-154
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,8 @@
7474
title: ControlNet
7575
- local: training/instructpix2pix
7676
title: InstructPix2Pix Training
77+
- local: training/custom_diffusion
78+
title: Custom Diffusion
7779
title: Training
7880
- sections:
7981
- local: using-diffusers/rl

docs/source/en/api/loaders.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,7 @@ API to load such adapter neural networks via the [`loaders.py` module](https://g
3636
### LoraLoaderMixin
3737

3838
[[autodoc]] loaders.LoraLoaderMixin
39+
40+
### FromCkptMixin
41+
42+
[[autodoc]] loaders.FromCkptMixin

docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,7 @@ All checkpoints can be found under the authors' namespace [lllyasviel](https://h
308308
- disable_vae_slicing
309309
- enable_xformers_memory_efficient_attention
310310
- disable_xformers_memory_efficient_attention
311+
- load_textual_inversion
311312

312313
## FlaxStableDiffusionControlNetPipeline
313314
[[autodoc]] FlaxStableDiffusionControlNetPipeline

docs/source/en/api/pipelines/stable_diffusion/depth2img.mdx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,4 +30,7 @@ Available Checkpoints are:
3030
- enable_attention_slicing
3131
- disable_attention_slicing
3232
- enable_xformers_memory_efficient_attention
33-
- disable_xformers_memory_efficient_attention
33+
- disable_xformers_memory_efficient_attention
34+
- load_textual_inversion
35+
- load_lora_weights
36+
- save_lora_weights

docs/source/en/api/pipelines/stable_diffusion/img2img.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,11 @@ proposed by Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan
3030
- disable_attention_slicing
3131
- enable_xformers_memory_efficient_attention
3232
- disable_xformers_memory_efficient_attention
33+
- load_textual_inversion
34+
- from_ckpt
35+
- load_lora_weights
36+
- save_lora_weights
3337

3438
[[autodoc]] FlaxStableDiffusionImg2ImgPipeline
3539
- all
36-
- __call__
40+
- __call__

docs/source/en/api/pipelines/stable_diffusion/inpaint.mdx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,10 @@ Available checkpoints are:
3131
- disable_attention_slicing
3232
- enable_xformers_memory_efficient_attention
3333
- disable_xformers_memory_efficient_attention
34+
- load_textual_inversion
35+
- load_lora_weights
36+
- save_lora_weights
3437

3538
[[autodoc]] FlaxStableDiffusionInpaintPipeline
3639
- all
37-
- __call__
40+
- __call__

docs/source/en/api/pipelines/stable_diffusion/pix2pix.mdx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,3 +68,6 @@ images[0].save("snowy_mountains.png")
6868
[[autodoc]] StableDiffusionInstructPix2PixPipeline
6969
- __call__
7070
- all
71+
- load_textual_inversion
72+
- load_lora_weights
73+
- save_lora_weights

docs/source/en/api/pipelines/stable_diffusion/text2img.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@ Available Checkpoints are:
3939
- disable_xformers_memory_efficient_attention
4040
- enable_vae_tiling
4141
- disable_vae_tiling
42+
- load_textual_inversion
43+
- from_ckpt
44+
- load_lora_weights
45+
- save_lora_weights
4246

4347
[[autodoc]] FlaxStableDiffusionPipeline
4448
- all
Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
<!--Copyright 2023 Custom Diffusion authors The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Custom Diffusion training example
14+
15+
[Custom Diffusion](https://arxiv.org/abs/2212.04488) is a method to customize text-to-image models like Stable Diffusion given just a few (4~5) images of a subject.
16+
The `train_custom_diffusion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
17+
18+
## Running locally with PyTorch
19+
20+
### Installing the dependencies
21+
22+
Before running the scripts, make sure to install the library's training dependencies:
23+
24+
**Important**
25+
26+
To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
27+
28+
```bash
29+
git clone https://github.com/huggingface/diffusers
30+
cd diffusers
31+
pip install -e .
32+
```
33+
34+
Then cd in the example folder and run
35+
36+
```bash
37+
pip install -r requirements.txt
38+
pip install clip-retrieval
39+
```
40+
41+
And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
42+
43+
```bash
44+
accelerate config
45+
```
46+
47+
Or for a default accelerate configuration without answering questions about your environment
48+
49+
```bash
50+
accelerate config default
51+
```
52+
53+
Or if your environment doesn't support an interactive shell e.g. a notebook
54+
55+
```python
56+
from accelerate.utils import write_basic_config
57+
58+
write_basic_config()
59+
```
60+
### Cat example 😺
61+
62+
Now let's get our dataset. Download dataset from [here](https://www.cs.cmu.edu/~custom-diffusion/assets/data.zip) and unzip it.
63+
64+
We also collect 200 real images using `clip-retrieval` which are combined with the target images in the training dataset as a regularization. This prevents overfitting to the the given target image. The following flags enable the regularization `with_prior_preservation`, `real_prior` with `prior_loss_weight=1.`.
65+
The `class_prompt` should be the category name same as target image. The collected real images are with text captions similar to the `class_prompt`. The retrieved image are saved in `class_data_dir`. You can disable `real_prior` to use generated images as regularization. To collect the real images use this command first before training.
66+
67+
```bash
68+
pip install clip-retrieval
69+
python retrieve.py --class_prompt cat --class_data_dir real_reg/samples_cat --num_class_images 200
70+
```
71+
72+
**___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___**
73+
74+
```bash
75+
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
76+
export OUTPUT_DIR="path-to-save-model"
77+
export INSTANCE_DIR="./data/cat"
78+
79+
accelerate launch train_custom_diffusion.py \
80+
--pretrained_model_name_or_path=$MODEL_NAME \
81+
--instance_data_dir=$INSTANCE_DIR \
82+
--output_dir=$OUTPUT_DIR \
83+
--class_data_dir=./real_reg/samples_cat/ \
84+
--with_prior_preservation --real_prior --prior_loss_weight=1.0 \
85+
--class_prompt="cat" --num_class_images=200 \
86+
--instance_prompt="photo of a <new1> cat" \
87+
--resolution=512 \
88+
--train_batch_size=2 \
89+
--learning_rate=1e-5 \
90+
--lr_warmup_steps=0 \
91+
--max_train_steps=250 \
92+
--scale_lr --hflip \
93+
--modifier_token "<new1>"
94+
```
95+
96+
**Use `--enable_xformers_memory_efficient_attention` for faster training with lower VRAM requirement (16GB per GPU). Follow [this guide](https://github.com/facebookresearch/xformers) for installation instructions.**
97+
98+
To track your experiments using Weights and Biases (`wandb`) and to save intermediate results (whcih we HIGHLY recommend), follow these steps:
99+
100+
* Install `wandb`: `pip install wandb`.
101+
* Authorize: `wandb login`.
102+
* Then specify a `validation_prompt` and set `report_to` to `wandb` while launching training. You can also configure the following related arguments:
103+
* `num_validation_images`
104+
* `validation_steps`
105+
106+
Here is an example command:
107+
108+
```bash
109+
accelerate launch train_custom_diffusion.py \
110+
--pretrained_model_name_or_path=$MODEL_NAME \
111+
--instance_data_dir=$INSTANCE_DIR \
112+
--output_dir=$OUTPUT_DIR \
113+
--class_data_dir=./real_reg/samples_cat/ \
114+
--with_prior_preservation --real_prior --prior_loss_weight=1.0 \
115+
--class_prompt="cat" --num_class_images=200 \
116+
--instance_prompt="photo of a <new1> cat" \
117+
--resolution=512 \
118+
--train_batch_size=2 \
119+
--learning_rate=1e-5 \
120+
--lr_warmup_steps=0 \
121+
--max_train_steps=250 \
122+
--scale_lr --hflip \
123+
--modifier_token "<new1>" \
124+
--validation_prompt="<new1> cat sitting in a bucket" \
125+
--report_to="wandb"
126+
```
127+
128+
Here is an example [Weights and Biases page](https://wandb.ai/sayakpaul/custom-diffusion/runs/26ghrcau) where you can check out the intermediate results along with other training details.
129+
130+
If you specify `--push_to_hub`, the learned parameters will be pushed to a repository on the Hugging Face Hub. Here is an [example repository](https://huggingface.co/sayakpaul/custom-diffusion-cat).
131+
132+
### Training on multiple concepts 🐱🪵
133+
134+
Provide a [json](https://github.com/adobe-research/custom-diffusion/blob/main/assets/concept_list.json) file with the info about each concept, similar to [this](https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/train_dreambooth.py).
135+
136+
To collect the real images run this command for each concept in the json file.
137+
138+
```bash
139+
pip install clip-retrieval
140+
python retrieve.py --class_prompt {} --class_data_dir {} --num_class_images 200
141+
```
142+
143+
And then we're ready to start training!
144+
145+
```bash
146+
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
147+
export OUTPUT_DIR="path-to-save-model"
148+
149+
accelerate launch train_custom_diffusion.py \
150+
--pretrained_model_name_or_path=$MODEL_NAME \
151+
--output_dir=$OUTPUT_DIR \
152+
--concepts_list=./concept_list.json \
153+
--with_prior_preservation --real_prior --prior_loss_weight=1.0 \
154+
--resolution=512 \
155+
--train_batch_size=2 \
156+
--learning_rate=1e-5 \
157+
--lr_warmup_steps=0 \
158+
--max_train_steps=500 \
159+
--num_class_images=200 \
160+
--scale_lr --hflip \
161+
--modifier_token "<new1>+<new2>"
162+
```
163+
164+
Here is an example [Weights and Biases page](https://wandb.ai/sayakpaul/custom-diffusion/runs/3990tzkg) where you can check out the intermediate results along with other training details.
165+
166+
### Training on human faces
167+
168+
For fine-tuning on human faces we found the following configuration to work better: `learning_rate=5e-6`, `max_train_steps=1000 to 2000`, and `freeze_model=crossattn` with at least 15-20 images.
169+
170+
To collect the real images use this command first before training.
171+
172+
```bash
173+
pip install clip-retrieval
174+
python retrieve.py --class_prompt person --class_data_dir real_reg/samples_person --num_class_images 200
175+
```
176+
177+
Then start training!
178+
179+
```bash
180+
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
181+
export OUTPUT_DIR="path-to-save-model"
182+
export INSTANCE_DIR="path-to-images"
183+
184+
accelerate launch train_custom_diffusion.py \
185+
--pretrained_model_name_or_path=$MODEL_NAME \
186+
--instance_data_dir=$INSTANCE_DIR \
187+
--output_dir=$OUTPUT_DIR \
188+
--class_data_dir=./real_reg/samples_person/ \
189+
--with_prior_preservation --real_prior --prior_loss_weight=1.0 \
190+
--class_prompt="person" --num_class_images=200 \
191+
--instance_prompt="photo of a <new1> person" \
192+
--resolution=512 \
193+
--train_batch_size=2 \
194+
--learning_rate=5e-6 \
195+
--lr_warmup_steps=0 \
196+
--max_train_steps=1000 \
197+
--scale_lr --hflip --noaug \
198+
--freeze_model crossattn \
199+
--modifier_token "<new1>" \
200+
--enable_xformers_memory_efficient_attention
201+
```
202+
203+
## Inference
204+
205+
Once you have trained a model using the above command, you can run inference using the below command. Make sure to include the `modifier token` (e.g. \<new1\> in above example) in your prompt.
206+
207+
```python
208+
import torch
209+
from diffusers import DiffusionPipeline
210+
211+
pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16).to("cuda")
212+
pipe.unet.load_attn_procs("path-to-save-model", weight_name="pytorch_custom_diffusion_weights.bin")
213+
pipe.load_textual_inversion("path-to-save-model", weight_name="<new1>.bin")
214+
215+
image = pipe(
216+
"<new1> cat sitting in a bucket",
217+
num_inference_steps=100,
218+
guidance_scale=6.0,
219+
eta=1.0,
220+
).images[0]
221+
image.save("cat.png")
222+
```
223+
224+
It's possible to directly load these parameters from a Hub repository:
225+
226+
```python
227+
import torch
228+
from huggingface_hub.repocard import RepoCard
229+
from diffusers import DiffusionPipeline
230+
231+
model_id = "sayakpaul/custom-diffusion-cat"
232+
card = RepoCard.load(model_id)
233+
base_model_id = card.data.to_dict()["base_model"]
234+
235+
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16).to("cuda")
236+
pipe.unet.load_attn_procs(model_id, weight_name="pytorch_custom_diffusion_weights.bin")
237+
pipe.load_textual_inversion(model_id, weight_name="<new1>.bin")
238+
239+
image = pipe(
240+
"<new1> cat sitting in a bucket",
241+
num_inference_steps=100,
242+
guidance_scale=6.0,
243+
eta=1.0,
244+
).images[0]
245+
image.save("cat.png")
246+
```
247+
248+
Here is an example of performing inference with multiple concepts:
249+
250+
```python
251+
import torch
252+
from huggingface_hub.repocard import RepoCard
253+
from diffusers import DiffusionPipeline
254+
255+
model_id = "sayakpaul/custom-diffusion-cat-wooden-pot"
256+
card = RepoCard.load(model_id)
257+
base_model_id = card.data.to_dict()["base_model"]
258+
259+
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16).to("cuda")
260+
pipe.unet.load_attn_procs(model_id, weight_name="pytorch_custom_diffusion_weights.bin")
261+
pipe.load_textual_inversion(model_id, weight_name="<new1>.bin")
262+
pipe.load_textual_inversion(model_id, weight_name="<new2>.bin")
263+
264+
image = pipe(
265+
"the <new1> cat sculpture in the style of a <new2> wooden pot",
266+
num_inference_steps=100,
267+
guidance_scale=6.0,
268+
eta=1.0,
269+
).images[0]
270+
image.save("multi-subject.png")
271+
```
272+
273+
Here, `cat` and `wooden pot` refer to the multiple concepts.
274+
275+
### Inference from a training checkpoint
276+
277+
You can also perform inference from one of the complete checkpoint saved during the training process, if you used the `--checkpointing_steps` argument.
278+
279+
TODO.
280+
281+
## Set grads to none
282+
283+
To save even more memory, pass the `--set_grads_to_none` argument to the script. This will set grads to None instead of zero. However, be aware that it changes certain behaviors, so if you start experiencing any problems, remove this argument.
284+
285+
More info: https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html
286+
287+
## Experimental results
288+
289+
You can refer to [our webpage](https://www.cs.cmu.edu/~custom-diffusion/) that discusses our experiments in detail.

docs/source/en/training/dreambooth.mdx

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,18 @@ DreamBooth finetuning is very sensitive to hyperparameters and easy to overfit.
6060

6161
<frameworkcontent>
6262
<pt>
63-
Let's try DreamBooth with a [few images of a dog](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ); download and save them to a directory and then set the `INSTANCE_DIR` environment variable to that path:
63+
Let's try DreamBooth with a
64+
[few images of a dog](https://huggingface.co/datasets/diffusers/dog-example);
65+
download and save them to a directory and then set the `INSTANCE_DIR` environment variable to that path:
66+
67+
```python
68+
local_dir = "./path_to_training_images"
69+
snapshot_download(
70+
"diffusers/dog-example",
71+
local_dir=local_dir, repo_type="dataset",
72+
ignore_patterns=".gitattributes",
73+
)
74+
```
6475

6576
```bash
6677
export MODEL_NAME="CompVis/stable-diffusion-v1-4"

0 commit comments

Comments
 (0)