Skip to content

Commit c681ad1

Browse files
add: section on multiple controlnets. (#2762)
* add: section on multiple controlnets. Co-authored-by: William Berman <[email protected]> * fix: docs. * fix: docs. --------- Co-authored-by: William Berman <[email protected]>
1 parent e0d8c9e commit c681ad1

File tree

1 file changed

+107
-0
lines changed

1 file changed

+107
-0
lines changed

docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,113 @@ This should take only around 3-4 seconds on GPU (depending on hardware). The out
135135

136136
<!-- TODO: add space -->
137137

138+
## Combining multiple conditionings
139+
140+
Multiple ControlNet conditionings can be combined for a single image generation. Pass a list of ControlNets to the pipeline's constructor and a corresponding list of conditionings to `__call__`.
141+
142+
When combining conditionings, it is helpful to mask conditionings such that they do not overlap. In the example, we mask the middle of the canny map where the pose conditioning is located.
143+
144+
It can also be helpful to vary the `controlnet_conditioning_scales` to emphasize one conditioning over the other.
145+
146+
### Canny conditioning
147+
148+
The original image:
149+
150+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/landscape.png"/>
151+
152+
Prepare the conditioning:
153+
154+
```python
155+
from diffusers.utils import load_image
156+
from PIL import Image
157+
import cv2
158+
import numpy as np
159+
from diffusers.utils import load_image
160+
161+
canny_image = load_image(
162+
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/landscape.png"
163+
)
164+
canny_image = np.array(canny_image)
165+
166+
low_threshold = 100
167+
high_threshold = 200
168+
169+
canny_image = cv2.Canny(canny_image, low_threshold, high_threshold)
170+
171+
# zero out middle columns of image where pose will be overlayed
172+
zero_start = canny_image.shape[1] // 4
173+
zero_end = zero_start + canny_image.shape[1] // 2
174+
canny_image[:, zero_start:zero_end] = 0
175+
176+
canny_image = canny_image[:, :, None]
177+
canny_image = np.concatenate([canny_image, canny_image, canny_image], axis=2)
178+
canny_image = Image.fromarray(canny_image)
179+
```
180+
181+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/controlnet/landscape_canny_masked.png"/>
182+
183+
### Openpose conditioning
184+
185+
The original image:
186+
187+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/person.png" width=600/>
188+
189+
Prepare the conditioning:
190+
191+
```python
192+
from controlnet_aux import OpenposeDetector
193+
from diffusers.utils import load_image
194+
195+
openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
196+
197+
openpose_image = load_image(
198+
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/person.png"
199+
)
200+
openpose_image = openpose(openpose_image)
201+
```
202+
203+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/controlnet/person_pose.png" width=600/>
204+
205+
### Running ControlNet with multiple conditionings
206+
207+
```python
208+
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
209+
import torch
210+
211+
controlnet = [
212+
ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16),
213+
ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16),
214+
]
215+
216+
pipe = StableDiffusionControlNetPipeline.from_pretrained(
217+
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
218+
)
219+
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
220+
221+
pipe.enable_xformers_memory_efficient_attention()
222+
pipe.enable_model_cpu_offload()
223+
224+
prompt = "a giant standing in a fantasy landscape, best quality"
225+
negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality"
226+
227+
generator = torch.Generator(device="cpu").manual_seed(1)
228+
229+
images = [openpose_image, canny_image]
230+
231+
image = pipe(
232+
prompt,
233+
images,
234+
num_inference_steps=20,
235+
generator=generator,
236+
negative_prompt=negative_prompt,
237+
controlnet_conditioning_scale=[1.0, 0.8],
238+
).images[0]
239+
240+
image.save("./multi_controlnet_output.png")
241+
```
242+
243+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/controlnet/multi_controlnet_output.png" width=600/>
244+
138245
## Available checkpoints
139246

140247
ControlNet requires a *control image* in addition to the text-to-image *prompt*.

0 commit comments

Comments
 (0)