|
| 1 | +<!--Copyright 2023 The HuggingFace Team. All rights reserved. |
| 2 | + |
| 3 | +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
| 4 | +the License. You may obtain a copy of the License at |
| 5 | + |
| 6 | +http://www.apache.org/licenses/LICENSE-2.0 |
| 7 | + |
| 8 | +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
| 9 | +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
| 10 | +specific language governing permissions and limitations under the License. |
| 11 | +--> |
| 12 | + |
| 13 | +# ControlNet |
| 14 | + |
| 15 | +[Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) (ControlNet) by Lvmin Zhang and Maneesh Agrawala. |
| 16 | + |
| 17 | +This example is based on the [training example in the original ControlNet repository](https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md). It trains a ControlNet to fill circles using a [small synthetic dataset](https://huggingface.co/datasets/fusing/fill50k). |
| 18 | + |
| 19 | +## Installing the dependencies |
| 20 | + |
| 21 | +Before running the scripts, make sure to install the library's training dependencies. |
| 22 | + |
| 23 | +<Tip warning={true}> |
| 24 | + |
| 25 | +To successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the installation up to date. We update the example scripts frequently and install example-specific requirements. |
| 26 | + |
| 27 | +</Tip> |
| 28 | + |
| 29 | +To do this, execute the following steps in a new virtual environment: |
| 30 | +```bash |
| 31 | +git clone https://github.com/huggingface/diffusers |
| 32 | +cd diffusers |
| 33 | +pip install -e . |
| 34 | +``` |
| 35 | + |
| 36 | +Then navigate into the example folder and run: |
| 37 | +```bash |
| 38 | +pip install -r requirements.txt |
| 39 | +``` |
| 40 | + |
| 41 | +And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with: |
| 42 | + |
| 43 | +```bash |
| 44 | +accelerate config |
| 45 | +``` |
| 46 | + |
| 47 | +Or for a default 🤗Accelerate configuration without answering questions about your environment: |
| 48 | + |
| 49 | +```bash |
| 50 | +accelerate config default |
| 51 | +``` |
| 52 | + |
| 53 | +Or if your environment doesn't support an interactive shell like a notebook: |
| 54 | + |
| 55 | +```python |
| 56 | +from accelerate.utils import write_basic_config |
| 57 | + |
| 58 | +write_basic_config() |
| 59 | +``` |
| 60 | + |
| 61 | +## Circle filling dataset |
| 62 | + |
| 63 | +The original dataset is hosted in the ControlNet [repo](https://huggingface.co/lllyasviel/ControlNet/blob/main/training/fill50k.zip), but we re-uploaded it [here](https://huggingface.co/datasets/fusing/fill50k) to be compatible with 🤗 Datasets so that it can handle the data loading within the training script. |
| 64 | + |
| 65 | +Our training examples use [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) because that is what the original set of ControlNet models was trained on. However, ControlNet can be trained to augment any compatible Stable Diffusion model (such as [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4)) or [`stabilityai/stable-diffusion-2-1`](https://huggingface.co/stabilityai/stable-diffusion-2-1). |
| 66 | + |
| 67 | +## Training |
| 68 | + |
| 69 | +Download the following images to condition our training with: |
| 70 | + |
| 71 | +```sh |
| 72 | +wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png |
| 73 | + |
| 74 | +wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png |
| 75 | +``` |
| 76 | + |
| 77 | + |
| 78 | +```bash |
| 79 | +export MODEL_DIR="runwayml/stable-diffusion-v1-5" |
| 80 | +export OUTPUT_DIR="path to save model" |
| 81 | + |
| 82 | +accelerate launch train_controlnet.py \ |
| 83 | + --pretrained_model_name_or_path=$MODEL_DIR \ |
| 84 | + --output_dir=$OUTPUT_DIR \ |
| 85 | + --dataset_name=fusing/fill50k \ |
| 86 | + --resolution=512 \ |
| 87 | + --learning_rate=1e-5 \ |
| 88 | + --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ |
| 89 | + --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ |
| 90 | + --train_batch_size=4 |
| 91 | +``` |
| 92 | + |
| 93 | +This default configuration requires ~38GB VRAM. |
| 94 | + |
| 95 | +By default, the training script logs outputs to tensorboard. Pass `--report_to wandb` to use Weights & |
| 96 | +Biases. |
| 97 | + |
| 98 | +Gradient accumulation with a smaller batch size can be used to reduce training requirements to ~20 GB VRAM. |
| 99 | + |
| 100 | +```bash |
| 101 | +export MODEL_DIR="runwayml/stable-diffusion-v1-5" |
| 102 | +export OUTPUT_DIR="path to save model" |
| 103 | + |
| 104 | +accelerate launch train_controlnet.py \ |
| 105 | + --pretrained_model_name_or_path=$MODEL_DIR \ |
| 106 | + --output_dir=$OUTPUT_DIR \ |
| 107 | + --dataset_name=fusing/fill50k \ |
| 108 | + --resolution=512 \ |
| 109 | + --learning_rate=1e-5 \ |
| 110 | + --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ |
| 111 | + --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ |
| 112 | + --train_batch_size=1 \ |
| 113 | + --gradient_accumulation_steps=4 |
| 114 | +``` |
| 115 | + |
| 116 | +## Example results |
| 117 | + |
| 118 | +#### After 300 steps with batch size 8 |
| 119 | + |
| 120 | +| | | |
| 121 | +|-------------------|:-------------------------:| |
| 122 | +| | red circle with blue background | |
| 123 | + |  | |
| 124 | +| | cyan circle with brown floral background | |
| 125 | + |  | |
| 126 | + |
| 127 | + |
| 128 | +#### After 6000 steps with batch size 8: |
| 129 | + |
| 130 | +| | | |
| 131 | +|-------------------|:-------------------------:| |
| 132 | +| | red circle with blue background | |
| 133 | + |  | |
| 134 | +| | cyan circle with brown floral background | |
| 135 | + |  | |
| 136 | + |
| 137 | +## Training on a 16 GB GPU |
| 138 | + |
| 139 | +Enable the following optimizations to train on a 16GB GPU: |
| 140 | + |
| 141 | +- Gradient checkpointing |
| 142 | +- bitsandbyte's 8-bit optimizer (take a look at the [installation]((https://github.com/TimDettmers/bitsandbytes#requirements--installation) instructions if you don't already have it installed) |
| 143 | + |
| 144 | +Now you can launch the training script: |
| 145 | + |
| 146 | +```bash |
| 147 | +export MODEL_DIR="runwayml/stable-diffusion-v1-5" |
| 148 | +export OUTPUT_DIR="path to save model" |
| 149 | + |
| 150 | +accelerate launch train_controlnet.py \ |
| 151 | + --pretrained_model_name_or_path=$MODEL_DIR \ |
| 152 | + --output_dir=$OUTPUT_DIR \ |
| 153 | + --dataset_name=fusing/fill50k \ |
| 154 | + --resolution=512 \ |
| 155 | + --learning_rate=1e-5 \ |
| 156 | + --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ |
| 157 | + --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ |
| 158 | + --train_batch_size=1 \ |
| 159 | + --gradient_accumulation_steps=4 \ |
| 160 | + --gradient_checkpointing \ |
| 161 | + --use_8bit_adam |
| 162 | +``` |
| 163 | + |
| 164 | +## Training on a 12 GB GPU |
| 165 | + |
| 166 | +Enable the following optimizations to train on a 12GB GPU: |
| 167 | +- Gradient checkpointing |
| 168 | +- bitsandbyte's 8-bit optimizer (take a look at the [installation]((https://github.com/TimDettmers/bitsandbytes#requirements--installation) instructions if you don't already have it installed) |
| 169 | +- xFormers (take a look at the [installation](https://huggingface.co/docs/diffusers/training/optimization/xformers) instructions if you don't already have it installed) |
| 170 | +- set gradients to `None` |
| 171 | + |
| 172 | +```bash |
| 173 | +export MODEL_DIR="runwayml/stable-diffusion-v1-5" |
| 174 | +export OUTPUT_DIR="path to save model" |
| 175 | + |
| 176 | +accelerate launch train_controlnet.py \ |
| 177 | + --pretrained_model_name_or_path=$MODEL_DIR \ |
| 178 | + --output_dir=$OUTPUT_DIR \ |
| 179 | + --dataset_name=fusing/fill50k \ |
| 180 | + --resolution=512 \ |
| 181 | + --learning_rate=1e-5 \ |
| 182 | + --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ |
| 183 | + --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ |
| 184 | + --train_batch_size=1 \ |
| 185 | + --gradient_accumulation_steps=4 \ |
| 186 | + --gradient_checkpointing \ |
| 187 | + --use_8bit_adam \ |
| 188 | + --enable_xformers_memory_efficient_attention \ |
| 189 | + --set_grads_to_none |
| 190 | +``` |
| 191 | + |
| 192 | +When using `enable_xformers_memory_efficient_attention`, please make sure to install `xformers` by `pip install xformers`. |
| 193 | + |
| 194 | +## Training on an 8 GB GPU |
| 195 | + |
| 196 | +We have not exhaustively tested DeepSpeed support for ControlNet. While the configuration does |
| 197 | +save memory, we have not confirmed whether the configuration trains successfully. You will very likely |
| 198 | +have to make changes to the config to have a successful training run. |
| 199 | + |
| 200 | +Enable the following optimizations to train on a 8GB GPU: |
| 201 | +- Gradient checkpointing |
| 202 | +- bitsandbyte's 8-bit optimizer (take a look at the [installation]((https://github.com/TimDettmers/bitsandbytes#requirements--installation) instructions if you don't already have it installed) |
| 203 | +- xFormers (take a look at the [installation](https://huggingface.co/docs/diffusers/training/optimization/xformers) instructions if you don't already have it installed) |
| 204 | +- set gradients to `None` |
| 205 | +- DeepSpeed stage 2 with parameter and optimizer offloading |
| 206 | +- fp16 mixed precision |
| 207 | + |
| 208 | +[DeepSpeed](https://www.deepspeed.ai/) can offload tensors from VRAM to either |
| 209 | +CPU or NVME. This requires significantly more RAM (about 25 GB). |
| 210 | + |
| 211 | +You'll have to configure your environment with `accelerate config` to enable DeepSpeed stage 2. |
| 212 | + |
| 213 | +The configuration file should look like this: |
| 214 | + |
| 215 | +```yaml |
| 216 | +compute_environment: LOCAL_MACHINE |
| 217 | +deepspeed_config: |
| 218 | + gradient_accumulation_steps: 4 |
| 219 | + offload_optimizer_device: cpu |
| 220 | + offload_param_device: cpu |
| 221 | + zero3_init_flag: false |
| 222 | + zero_stage: 2 |
| 223 | +distributed_type: DEEPSPEED |
| 224 | +``` |
| 225 | +
|
| 226 | +<Tip> |
| 227 | +
|
| 228 | +See [documentation](https://huggingface.co/docs/accelerate/usage_guides/deepspeed) for more DeepSpeed configuration options. |
| 229 | +
|
| 230 | +<Tip> |
| 231 | +
|
| 232 | +Changing the default Adam optimizer to DeepSpeed's Adam |
| 233 | +`deepspeed.ops.adam.DeepSpeedCPUAdam` gives a substantial speedup but |
| 234 | +it requires a CUDA toolchain with the same version as PyTorch. 8-bit optimizer |
| 235 | +does not seem to be compatible with DeepSpeed at the moment. |
| 236 | + |
| 237 | +```bash |
| 238 | +export MODEL_DIR="runwayml/stable-diffusion-v1-5" |
| 239 | +export OUTPUT_DIR="path to save model" |
| 240 | +
|
| 241 | +accelerate launch train_controlnet.py \ |
| 242 | + --pretrained_model_name_or_path=$MODEL_DIR \ |
| 243 | + --output_dir=$OUTPUT_DIR \ |
| 244 | + --dataset_name=fusing/fill50k \ |
| 245 | + --resolution=512 \ |
| 246 | + --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ |
| 247 | + --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ |
| 248 | + --train_batch_size=1 \ |
| 249 | + --gradient_accumulation_steps=4 \ |
| 250 | + --gradient_checkpointing \ |
| 251 | + --enable_xformers_memory_efficient_attention \ |
| 252 | + --set_grads_to_none \ |
| 253 | + --mixed_precision fp16 |
| 254 | +``` |
| 255 | + |
| 256 | +## Inference |
| 257 | + |
| 258 | +The trained model can be run with the [`StableDiffusionControlNetPipeline`]. |
| 259 | +Set `base_model_path` and `controlnet_path` to the values `--pretrained_model_name_or_path` and |
| 260 | +`--output_dir` were respectively set to in the training script. |
| 261 | + |
| 262 | +```py |
| 263 | +from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler |
| 264 | +from diffusers.utils import load_image |
| 265 | +import torch |
| 266 | +
|
| 267 | +base_model_path = "path to model" |
| 268 | +controlnet_path = "path to controlnet" |
| 269 | +
|
| 270 | +controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16) |
| 271 | +pipe = StableDiffusionControlNetPipeline.from_pretrained( |
| 272 | + base_model_path, controlnet=controlnet, torch_dtype=torch.float16 |
| 273 | +) |
| 274 | +
|
| 275 | +# speed up diffusion process with faster scheduler and memory optimization |
| 276 | +pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) |
| 277 | +# remove following line if xformers is not installed |
| 278 | +pipe.enable_xformers_memory_efficient_attention() |
| 279 | +
|
| 280 | +pipe.enable_model_cpu_offload() |
| 281 | +
|
| 282 | +control_image = load_image("./conditioning_image_1.png") |
| 283 | +prompt = "pale golden rod circle with old lace background" |
| 284 | +
|
| 285 | +# generate image |
| 286 | +generator = torch.manual_seed(0) |
| 287 | +image = pipe(prompt, num_inference_steps=20, generator=generator, image=control_image).images[0] |
| 288 | +
|
| 289 | +image.save("./output.png") |
| 290 | +``` |
0 commit comments