Skip to content

Commit 73bdad0

Browse files
sayakpaulstevhliu
andauthored
add: controlnet entry to training section in the docs. (#2677)
* add: controlnet entry to training section in the docs. * formatting. * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * wrap in a tip block. --------- Co-authored-by: Steven Liu <[email protected]>
1 parent ba87c16 commit 73bdad0

File tree

3 files changed

+295
-0
lines changed

3 files changed

+295
-0
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,8 @@
6868
title: Text-to-image
6969
- local: training/lora
7070
title: Low-Rank Adaptation of Large Language Models (LoRA)
71+
- local: training/controlnet
72+
title: ControlNet
7173
title: Training
7274
- sections:
7375
- local: using-diffusers/rl
Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,290 @@
1+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# ControlNet
14+
15+
[Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) (ControlNet) by Lvmin Zhang and Maneesh Agrawala.
16+
17+
This example is based on the [training example in the original ControlNet repository](https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md). It trains a ControlNet to fill circles using a [small synthetic dataset](https://huggingface.co/datasets/fusing/fill50k).
18+
19+
## Installing the dependencies
20+
21+
Before running the scripts, make sure to install the library's training dependencies.
22+
23+
<Tip warning={true}>
24+
25+
To successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the installation up to date. We update the example scripts frequently and install example-specific requirements.
26+
27+
</Tip>
28+
29+
To do this, execute the following steps in a new virtual environment:
30+
```bash
31+
git clone https://github.com/huggingface/diffusers
32+
cd diffusers
33+
pip install -e .
34+
```
35+
36+
Then navigate into the example folder and run:
37+
```bash
38+
pip install -r requirements.txt
39+
```
40+
41+
And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
42+
43+
```bash
44+
accelerate config
45+
```
46+
47+
Or for a default 🤗Accelerate configuration without answering questions about your environment:
48+
49+
```bash
50+
accelerate config default
51+
```
52+
53+
Or if your environment doesn't support an interactive shell like a notebook:
54+
55+
```python
56+
from accelerate.utils import write_basic_config
57+
58+
write_basic_config()
59+
```
60+
61+
## Circle filling dataset
62+
63+
The original dataset is hosted in the ControlNet [repo](https://huggingface.co/lllyasviel/ControlNet/blob/main/training/fill50k.zip), but we re-uploaded it [here](https://huggingface.co/datasets/fusing/fill50k) to be compatible with 🤗 Datasets so that it can handle the data loading within the training script.
64+
65+
Our training examples use [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) because that is what the original set of ControlNet models was trained on. However, ControlNet can be trained to augment any compatible Stable Diffusion model (such as [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4)) or [`stabilityai/stable-diffusion-2-1`](https://huggingface.co/stabilityai/stable-diffusion-2-1).
66+
67+
## Training
68+
69+
Download the following images to condition our training with:
70+
71+
```sh
72+
wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png
73+
74+
wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png
75+
```
76+
77+
78+
```bash
79+
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
80+
export OUTPUT_DIR="path to save model"
81+
82+
accelerate launch train_controlnet.py \
83+
--pretrained_model_name_or_path=$MODEL_DIR \
84+
--output_dir=$OUTPUT_DIR \
85+
--dataset_name=fusing/fill50k \
86+
--resolution=512 \
87+
--learning_rate=1e-5 \
88+
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
89+
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
90+
--train_batch_size=4
91+
```
92+
93+
This default configuration requires ~38GB VRAM.
94+
95+
By default, the training script logs outputs to tensorboard. Pass `--report_to wandb` to use Weights &
96+
Biases.
97+
98+
Gradient accumulation with a smaller batch size can be used to reduce training requirements to ~20 GB VRAM.
99+
100+
```bash
101+
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
102+
export OUTPUT_DIR="path to save model"
103+
104+
accelerate launch train_controlnet.py \
105+
--pretrained_model_name_or_path=$MODEL_DIR \
106+
--output_dir=$OUTPUT_DIR \
107+
--dataset_name=fusing/fill50k \
108+
--resolution=512 \
109+
--learning_rate=1e-5 \
110+
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
111+
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
112+
--train_batch_size=1 \
113+
--gradient_accumulation_steps=4
114+
```
115+
116+
## Example results
117+
118+
#### After 300 steps with batch size 8
119+
120+
| | |
121+
|-------------------|:-------------------------:|
122+
| | red circle with blue background |
123+
![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png) | ![red circle with blue background](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/red_circle_with_blue_background_300_steps.png) |
124+
| | cyan circle with brown floral background |
125+
![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png) | ![cyan circle with brown floral background](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/cyan_circle_with_brown_floral_background_300_steps.png) |
126+
127+
128+
#### After 6000 steps with batch size 8:
129+
130+
| | |
131+
|-------------------|:-------------------------:|
132+
| | red circle with blue background |
133+
![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png) | ![red circle with blue background](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/red_circle_with_blue_background_6000_steps.png) |
134+
| | cyan circle with brown floral background |
135+
![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png) | ![cyan circle with brown floral background](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/cyan_circle_with_brown_floral_background_6000_steps.png) |
136+
137+
## Training on a 16 GB GPU
138+
139+
Enable the following optimizations to train on a 16GB GPU:
140+
141+
- Gradient checkpointing
142+
- bitsandbyte's 8-bit optimizer (take a look at the [installation]((https://github.com/TimDettmers/bitsandbytes#requirements--installation) instructions if you don't already have it installed)
143+
144+
Now you can launch the training script:
145+
146+
```bash
147+
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
148+
export OUTPUT_DIR="path to save model"
149+
150+
accelerate launch train_controlnet.py \
151+
--pretrained_model_name_or_path=$MODEL_DIR \
152+
--output_dir=$OUTPUT_DIR \
153+
--dataset_name=fusing/fill50k \
154+
--resolution=512 \
155+
--learning_rate=1e-5 \
156+
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
157+
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
158+
--train_batch_size=1 \
159+
--gradient_accumulation_steps=4 \
160+
--gradient_checkpointing \
161+
--use_8bit_adam
162+
```
163+
164+
## Training on a 12 GB GPU
165+
166+
Enable the following optimizations to train on a 12GB GPU:
167+
- Gradient checkpointing
168+
- bitsandbyte's 8-bit optimizer (take a look at the [installation]((https://github.com/TimDettmers/bitsandbytes#requirements--installation) instructions if you don't already have it installed)
169+
- xFormers (take a look at the [installation](https://huggingface.co/docs/diffusers/training/optimization/xformers) instructions if you don't already have it installed)
170+
- set gradients to `None`
171+
172+
```bash
173+
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
174+
export OUTPUT_DIR="path to save model"
175+
176+
accelerate launch train_controlnet.py \
177+
--pretrained_model_name_or_path=$MODEL_DIR \
178+
--output_dir=$OUTPUT_DIR \
179+
--dataset_name=fusing/fill50k \
180+
--resolution=512 \
181+
--learning_rate=1e-5 \
182+
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
183+
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
184+
--train_batch_size=1 \
185+
--gradient_accumulation_steps=4 \
186+
--gradient_checkpointing \
187+
--use_8bit_adam \
188+
--enable_xformers_memory_efficient_attention \
189+
--set_grads_to_none
190+
```
191+
192+
When using `enable_xformers_memory_efficient_attention`, please make sure to install `xformers` by `pip install xformers`.
193+
194+
## Training on an 8 GB GPU
195+
196+
We have not exhaustively tested DeepSpeed support for ControlNet. While the configuration does
197+
save memory, we have not confirmed whether the configuration trains successfully. You will very likely
198+
have to make changes to the config to have a successful training run.
199+
200+
Enable the following optimizations to train on a 8GB GPU:
201+
- Gradient checkpointing
202+
- bitsandbyte's 8-bit optimizer (take a look at the [installation]((https://github.com/TimDettmers/bitsandbytes#requirements--installation) instructions if you don't already have it installed)
203+
- xFormers (take a look at the [installation](https://huggingface.co/docs/diffusers/training/optimization/xformers) instructions if you don't already have it installed)
204+
- set gradients to `None`
205+
- DeepSpeed stage 2 with parameter and optimizer offloading
206+
- fp16 mixed precision
207+
208+
[DeepSpeed](https://www.deepspeed.ai/) can offload tensors from VRAM to either
209+
CPU or NVME. This requires significantly more RAM (about 25 GB).
210+
211+
You'll have to configure your environment with `accelerate config` to enable DeepSpeed stage 2.
212+
213+
The configuration file should look like this:
214+
215+
```yaml
216+
compute_environment: LOCAL_MACHINE
217+
deepspeed_config:
218+
gradient_accumulation_steps: 4
219+
offload_optimizer_device: cpu
220+
offload_param_device: cpu
221+
zero3_init_flag: false
222+
zero_stage: 2
223+
distributed_type: DEEPSPEED
224+
```
225+
226+
<Tip>
227+
228+
See [documentation](https://huggingface.co/docs/accelerate/usage_guides/deepspeed) for more DeepSpeed configuration options.
229+
230+
<Tip>
231+
232+
Changing the default Adam optimizer to DeepSpeed's Adam
233+
`deepspeed.ops.adam.DeepSpeedCPUAdam` gives a substantial speedup but
234+
it requires a CUDA toolchain with the same version as PyTorch. 8-bit optimizer
235+
does not seem to be compatible with DeepSpeed at the moment.
236+
237+
```bash
238+
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
239+
export OUTPUT_DIR="path to save model"
240+
241+
accelerate launch train_controlnet.py \
242+
--pretrained_model_name_or_path=$MODEL_DIR \
243+
--output_dir=$OUTPUT_DIR \
244+
--dataset_name=fusing/fill50k \
245+
--resolution=512 \
246+
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
247+
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
248+
--train_batch_size=1 \
249+
--gradient_accumulation_steps=4 \
250+
--gradient_checkpointing \
251+
--enable_xformers_memory_efficient_attention \
252+
--set_grads_to_none \
253+
--mixed_precision fp16
254+
```
255+
256+
## Inference
257+
258+
The trained model can be run with the [`StableDiffusionControlNetPipeline`].
259+
Set `base_model_path` and `controlnet_path` to the values `--pretrained_model_name_or_path` and
260+
`--output_dir` were respectively set to in the training script.
261+
262+
```py
263+
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
264+
from diffusers.utils import load_image
265+
import torch
266+
267+
base_model_path = "path to model"
268+
controlnet_path = "path to controlnet"
269+
270+
controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16)
271+
pipe = StableDiffusionControlNetPipeline.from_pretrained(
272+
base_model_path, controlnet=controlnet, torch_dtype=torch.float16
273+
)
274+
275+
# speed up diffusion process with faster scheduler and memory optimization
276+
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
277+
# remove following line if xformers is not installed
278+
pipe.enable_xformers_memory_efficient_attention()
279+
280+
pipe.enable_model_cpu_offload()
281+
282+
control_image = load_image("./conditioning_image_1.png")
283+
prompt = "pale golden rod circle with old lace background"
284+
285+
# generate image
286+
generator = torch.manual_seed(0)
287+
image = pipe(prompt, num_inference_steps=20, generator=generator, image=control_image).images[0]
288+
289+
image.save("./output.png")
290+
```

docs/source/en/training/overview.mdx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie
3838
- [Text Inversion](./text_inversion)
3939
- [Dreambooth](./dreambooth)
4040
- [LoRA Support](./lora)
41+
- [ControlNet](./controlnet)
4142

4243
If possible, please [install xFormers](../optimization/xformers) for memory efficient attention. This could help make your training faster and less memory intensive.
4344

@@ -47,6 +48,8 @@ If possible, please [install xFormers](../optimization/xformers) for memory effi
4748
| [**Text-to-Image fine-tuning**](./text2image) | | |
4849
| [**Textual Inversion**](./text_inversion) | | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
4950
| [**Dreambooth**](./dreambooth) | | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb)
51+
| [**Training with LoRA**](./lora) | | - | - |
52+
| [**ControlNet**](./controlnet) | | | - |
5053

5154
## Community
5255

0 commit comments

Comments
 (0)