Skip to content

Commit e192ae0

Browse files
UmerHApatrickvonplatenstevhliuDN6
authored
Add ControlNet-XS support (#5827)
* Check in 23-10-05 * check-in 23-10-06 * check-in 23-10-07 2pm * check-in 23-10-08 * check-in 231009T1200 * check-in 230109 * checkin 231010 * init + forward run * checkin * checkin * ControlNetXSModel is now saveable+loadable * Forward works * checkin * Pipeline works with `no_control=True` * checkin * debug: save intermediate outputs of resnet * checkin * Understood time error + fixed connection error * checkin * checkin 231106T1600 * turned off detailled debug prints * time debug logs * small fix * Separated control_scale for connections/time * simplified debug logging * Full denoising works with control scale = 0 * aligned logs * Added control_attention_head_dim param * Passing n_heads instead of dim_head into ctrl unet * Fixed ctrl midblock bug * Cleanup * Fixed time dtype bug * checkin * 1. from_unet, 2. base passed, 3. all unet params * checkin * Finished docstrings * cleanup * make style * checkin * more tests pass * Fixed tests * removed debug logs * make style + quality * make fix-copies * fixed documentation * added cnxs to doc toc * added control start/end param * Update controlnetxs_sdxl.md * tried to fix copies.. * Fixed norm_num_groups in from_unet * added sdxl-depth test * created SD2.1 controlnet-xs pipeline * re-added debug logs * Adjusting group norm ; readded logs * Added debug log statements * removed debug logs ; started tests for sd2.1 * updated sd21 tests * fixed tests * fixed tests * slightly increased error tolerance for 1 test * make style & quality * Added docs for CNXS-SD * make fix-copies * Fixed sd compile test ; fixed gradient ckpointing * vae downs = cnxs conditioning downs; removed guess * make style & quality * Fixed tests * fixed test * Incorporated review feedback * simplified control model surgery * fixed tests & make style / quality * Updated docs; deleted pip & cursor files * Rolled back minimal change to resnet * Update resnet.py * Update resnet.py * Update src/diffusers/models/controlnetxs.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/models/controlnetxs.py Co-authored-by: Patrick von Platen <[email protected]> * Incorporated review feedback * Update docs/source/en/api/pipelines/controlnetxs_sdxl.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/api/pipelines/controlnetxs.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/api/pipelines/controlnetxs.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/api/pipelines/controlnetxs.md Co-authored-by: Steven Liu <[email protected]> * Update src/diffusers/models/controlnetxs.py Co-authored-by: Steven Liu <[email protected]> * Update src/diffusers/models/controlnetxs.py Co-authored-by: Steven Liu <[email protected]> * Update src/diffusers/pipelines/controlnet_xs/pipeline_controlnet_xs.py Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/api/pipelines/controlnetxs.md Co-authored-by: Steven Liu <[email protected]> * Update src/diffusers/pipelines/controlnet_xs/pipeline_controlnet_xs_sd_xl.py Co-authored-by: Steven Liu <[email protected]> * Incorporated doc feedback --------- Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Steven Liu <[email protected]> Co-authored-by: Dhruv Nair <[email protected]>
1 parent 87a09d6 commit e192ae0

File tree

16 files changed

+3929
-0
lines changed

16 files changed

+3929
-0
lines changed

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,10 @@
264264
title: ControlNet
265265
- local: api/pipelines/controlnet_sdxl
266266
title: ControlNet with Stable Diffusion XL
267+
- local: api/pipelines/controlnetxs
268+
title: ControlNet-XS
269+
- local: api/pipelines/controlnetxs_sdxl
270+
title: ControlNet-XS with Stable Diffusion XL
267271
- local: api/pipelines/cycle_diffusion
268272
title: Cycle Diffusion
269273
- local: api/pipelines/dance_diffusion
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# ControlNet-XS
14+
15+
ControlNet-XS was introduced in [ControlNet-XS](https://vislearn.github.io/ControlNet-XS/) by Denis Zavadski and Carsten Rother. It is based on the observation that the control model in the [original ControlNet](https://huggingface.co/papers/2302.05543) can be made much smaller and still produce good results.
16+
17+
Like the original ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
18+
19+
ControlNet-XS generates images with comparable quality to a regular ControlNet, but it is 20-25% faster ([see benchmark](https://github.com/UmerHA/controlnet-xs-benchmark/blob/main/Speed%20Benchmark.ipynb) with StableDiffusion-XL) and uses ~45% less memory.
20+
21+
Here's the overview from the [project page](https://vislearn.github.io/ControlNet-XS/):
22+
23+
*With increasing computing capabilities, current model architectures appear to follow the trend of simply upscaling all components without validating the necessity for doing so. In this project we investigate the size and architectural design of ControlNet [Zhang et al., 2023] for controlling the image generation process with stable diffusion-based models. We show that a new architecture with as little as 1% of the parameters of the base model achieves state-of-the art results, considerably better than ControlNet in terms of FID score. Hence we call it ControlNet-XS. We provide the code for controlling StableDiffusion-XL [Podell et al., 2023] (Model B, 48M Parameters) and StableDiffusion 2.1 [Rombach et al. 2022] (Model B, 14M Parameters), all under openrail license.*
24+
25+
This model was contributed by [UmerHA](https://twitter.com/UmerHAdil). ❤️
26+
27+
<Tip>
28+
29+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
30+
31+
</Tip>
32+
33+
## StableDiffusionControlNetXSPipeline
34+
[[autodoc]] StableDiffusionControlNetXSPipeline
35+
- all
36+
- __call__
37+
38+
## StableDiffusionPipelineOutput
39+
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# ControlNet-XS with Stable Diffusion XL
14+
15+
ControlNet-XS was introduced in [ControlNet-XS](https://vislearn.github.io/ControlNet-XS/) by Denis Zavadski and Carsten Rother. It is based on the observation that the control model in the [original ControlNet](https://huggingface.co/papers/2302.05543) can be made much smaller and still produce good results.
16+
17+
Like the original ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
18+
19+
ControlNet-XS generates images with comparable quality to a regular ControlNet, but it is 20-25% faster ([see benchmark](https://github.com/UmerHA/controlnet-xs-benchmark/blob/main/Speed%20Benchmark.ipynb)) and uses ~45% less memory.
20+
21+
Here's the overview from the [project page](https://vislearn.github.io/ControlNet-XS/):
22+
23+
*With increasing computing capabilities, current model architectures appear to follow the trend of simply upscaling all components without validating the necessity for doing so. In this project we investigate the size and architectural design of ControlNet [Zhang et al., 2023] for controlling the image generation process with stable diffusion-based models. We show that a new architecture with as little as 1% of the parameters of the base model achieves state-of-the art results, considerably better than ControlNet in terms of FID score. Hence we call it ControlNet-XS. We provide the code for controlling StableDiffusion-XL [Podell et al., 2023] (Model B, 48M Parameters) and StableDiffusion 2.1 [Rombach et al. 2022] (Model B, 14M Parameters), all under openrail license.*
24+
25+
This model was contributed by [UmerHA](https://twitter.com/UmerHAdil). ❤️
26+
27+
<Tip warning={true}>
28+
29+
🧪 Many of the SDXL ControlNet checkpoints are experimental, and there is a lot of room for improvement. Feel free to open an [Issue](https://github.com/huggingface/diffusers/issues/new/choose) and leave us feedback on how we can improve!
30+
31+
</Tip>
32+
33+
<Tip>
34+
35+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
36+
37+
</Tip>
38+
39+
## StableDiffusionXLControlNetXSPipeline
40+
[[autodoc]] StableDiffusionXLControlNetXSPipeline
41+
- all
42+
- __call__
43+
44+
## StableDiffusionPipelineOutput
45+
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

docs/source/en/api/pipelines/overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
4040
| [Consistency Models](consistency_models) | unconditional image generation |
4141
| [ControlNet](controlnet) | text2image, image2image, inpainting |
4242
| [ControlNet with Stable Diffusion XL](controlnet_sdxl) | text2image |
43+
| [ControlNet-XS](controlnetxs) | text2image |
44+
| [ControlNet-XS with Stable Diffusion XL](controlnetxs_sdxl) | text2image |
4345
| [Cycle Diffusion](cycle_diffusion) | image2image |
4446
| [Dance Diffusion](dance_diffusion) | unconditional audio generation |
4547
| [DDIM](ddim) | unconditional image generation |

src/diffusers/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@
8080
"AutoencoderTiny",
8181
"ConsistencyDecoderVAE",
8282
"ControlNetModel",
83+
"ControlNetXSModel",
8384
"Kandinsky3UNet",
8485
"ModelMixin",
8586
"MotionAdapter",
@@ -250,6 +251,7 @@
250251
"StableDiffusionControlNetImg2ImgPipeline",
251252
"StableDiffusionControlNetInpaintPipeline",
252253
"StableDiffusionControlNetPipeline",
254+
"StableDiffusionControlNetXSPipeline",
253255
"StableDiffusionDepth2ImgPipeline",
254256
"StableDiffusionDiffEditPipeline",
255257
"StableDiffusionGLIGENPipeline",
@@ -273,6 +275,7 @@
273275
"StableDiffusionXLControlNetImg2ImgPipeline",
274276
"StableDiffusionXLControlNetInpaintPipeline",
275277
"StableDiffusionXLControlNetPipeline",
278+
"StableDiffusionXLControlNetXSPipeline",
276279
"StableDiffusionXLImg2ImgPipeline",
277280
"StableDiffusionXLInpaintPipeline",
278281
"StableDiffusionXLInstructPix2PixPipeline",
@@ -454,6 +457,7 @@
454457
AutoencoderTiny,
455458
ConsistencyDecoderVAE,
456459
ControlNetModel,
460+
ControlNetXSModel,
457461
Kandinsky3UNet,
458462
ModelMixin,
459463
MotionAdapter,
@@ -603,6 +607,7 @@
603607
StableDiffusionControlNetImg2ImgPipeline,
604608
StableDiffusionControlNetInpaintPipeline,
605609
StableDiffusionControlNetPipeline,
610+
StableDiffusionControlNetXSPipeline,
606611
StableDiffusionDepth2ImgPipeline,
607612
StableDiffusionDiffEditPipeline,
608613
StableDiffusionGLIGENPipeline,
@@ -626,6 +631,7 @@
626631
StableDiffusionXLControlNetImg2ImgPipeline,
627632
StableDiffusionXLControlNetInpaintPipeline,
628633
StableDiffusionXLControlNetPipeline,
634+
StableDiffusionXLControlNetXSPipeline,
629635
StableDiffusionXLImg2ImgPipeline,
630636
StableDiffusionXLInpaintPipeline,
631637
StableDiffusionXLInstructPix2PixPipeline,

src/diffusers/models/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
_import_structure["autoencoder_tiny"] = ["AutoencoderTiny"]
3333
_import_structure["consistency_decoder_vae"] = ["ConsistencyDecoderVAE"]
3434
_import_structure["controlnet"] = ["ControlNetModel"]
35+
_import_structure["controlnetxs"] = ["ControlNetXSModel"]
3536
_import_structure["dual_transformer_2d"] = ["DualTransformer2DModel"]
3637
_import_structure["embeddings"] = ["ImageProjection"]
3738
_import_structure["modeling_utils"] = ["ModelMixin"]
@@ -63,6 +64,7 @@
6364
from .autoencoder_tiny import AutoencoderTiny
6465
from .consistency_decoder_vae import ConsistencyDecoderVAE
6566
from .controlnet import ControlNetModel
67+
from .controlnetxs import ControlNetXSModel
6668
from .dual_transformer_2d import DualTransformer2DModel
6769
from .embeddings import ImageProjection
6870
from .modeling_utils import ModelMixin

0 commit comments

Comments
 (0)