Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
ed8c82c
up
patrickvonplaten Nov 14, 2022
d1e8a50
convert dual unet
anton-l Nov 15, 2022
e00a9cf
revert dual attn
anton-l Nov 15, 2022
833cd1d
adapt for vd-official
anton-l Nov 15, 2022
e455921
test the full pipeline
anton-l Nov 15, 2022
53f080f
mixed inference
anton-l Nov 16, 2022
b5778e0
mixed inference for text2img
anton-l Nov 16, 2022
ee84175
merge main
anton-l Nov 16, 2022
9a8114a
add image prompting
anton-l Nov 16, 2022
b17475e
fix clip norm
anton-l Nov 16, 2022
a758804
Merge branch 'main' of https://github.com/huggingface/diffusers into …
patrickvonplaten Nov 21, 2022
74fde82
split text2img and img2img
anton-l Nov 21, 2022
5785e27
Merge remote-tracking branch 'origin/add_versatile_diffusers' into ad…
anton-l Nov 21, 2022
22e6b54
fix format
anton-l Nov 21, 2022
d36cf41
refactor text2img
anton-l Nov 21, 2022
303052d
mega pipeline
anton-l Nov 21, 2022
f2bc526
add optimus
patrickvonplaten Nov 21, 2022
2a50c84
add gpt2
patrickvonplaten Nov 21, 2022
bc509b2
refactor image var
anton-l Nov 21, 2022
4d9ec98
Merge remote-tracking branch 'origin/add_versatile_diffusers' into ad…
anton-l Nov 21, 2022
8c989eb
wip text_unet
anton-l Nov 21, 2022
f706729
text unet end to end
anton-l Nov 21, 2022
bf8f2fb
update tests
anton-l Nov 22, 2022
e4728c2
reshape
anton-l Nov 22, 2022
2b7cd87
fix image to text
patrickvonplaten Nov 22, 2022
7c999fe
add some first docs
patrickvonplaten Nov 22, 2022
02254cb
dual guided pipeline
anton-l Nov 22, 2022
efe41ff
Merge remote-tracking branch 'origin/add_versatile_diffusers' into ad…
anton-l Nov 22, 2022
95e3711
fix token ratio
anton-l Nov 22, 2022
22c6b32
propose change
patrickvonplaten Nov 23, 2022
8f5f372
dual transformer as a native module
anton-l Nov 23, 2022
f5e8ec6
DualTransformer(nn.Module)
anton-l Nov 23, 2022
914942f
DualTransformer(nn.Module)
anton-l Nov 23, 2022
8d4207d
correct unconditional image
patrickvonplaten Nov 23, 2022
9d37751
Merge branch 'add_versatile_diffusers' of https://github.com/huggingf…
patrickvonplaten Nov 23, 2022
5ab90f6
save-load with mega pipeline
anton-l Nov 23, 2022
696dd6f
Merge remote-tracking branch 'origin/add_versatile_diffusers' into ad…
anton-l Nov 23, 2022
008af3a
remove image to text
patrickvonplaten Nov 23, 2022
c91d0a4
Merge branch 'add_versatile_diffusers' of https://github.com/huggingf…
patrickvonplaten Nov 23, 2022
ff8188a
up
patrickvonplaten Nov 23, 2022
1bded5a
uP
patrickvonplaten Nov 23, 2022
af8a378
fix
patrickvonplaten Nov 23, 2022
7bf2d4d
up
patrickvonplaten Nov 23, 2022
a32c942
final fix
patrickvonplaten Nov 23, 2022
447780d
remove_unused_weights
anton-l Nov 23, 2022
1b85e34
test updates
anton-l Nov 23, 2022
e950199
save progress
patrickvonplaten Nov 23, 2022
1599b14
Merge branch 'add_versatile_diffusers' of https://github.com/huggingf…
patrickvonplaten Nov 23, 2022
2e2df18
uP
patrickvonplaten Nov 23, 2022
dd9dce5
fix dual prompts
anton-l Nov 23, 2022
6cbee51
some fixes
patrickvonplaten Nov 23, 2022
e9843fa
finish
patrickvonplaten Nov 23, 2022
cea10a0
style
anton-l Nov 23, 2022
59c2fef
finish renaming
patrickvonplaten Nov 23, 2022
eb02e1d
merge main
anton-l Nov 23, 2022
5669d93
finish
patrickvonplaten Nov 23, 2022
5fc757e
Merge branch 'add_versatile_diffusers' of https://github.com/huggingf…
patrickvonplaten Nov 23, 2022
2e5128d
up
patrickvonplaten Nov 23, 2022
9f31d8a
fix
patrickvonplaten Nov 23, 2022
e742f16
fix
patrickvonplaten Nov 23, 2022
ace7123
fix
patrickvonplaten Nov 23, 2022
8a6f0c9
finish
patrickvonplaten Nov 23, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,8 @@
title: "Stochastic Karras VE"
- local: api/pipelines/dance_diffusion
title: "Dance Diffusion"
- local: api/pipelines/versatile_diffusion
title: "Versatile Diffusion"
- local: api/pipelines/vq_diffusion
title: "VQ Diffusion"
- local: api/pipelines/repaint
Expand Down
3 changes: 3 additions & 0 deletions docs/source/api/pipelines/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@ available a colab notebook to directly try them out.
| [stable_diffusion](./api/pipelines/stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
| [stable_diffusion_safe](./api/pipelines/stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb)
| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation |
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation |
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation |
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation |
| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation |


Expand Down
73 changes: 73 additions & 0 deletions docs/source/api/pipelines/versatile_diffusion.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# VersatileDiffusion

VersatileDiffusion was proposed in [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) by Xingqian Xu, Zhangyang Wang, Eric Zhang, Kai Wang, Humphrey Shi .

The abstract of the paper is the following:

*The recent advances in diffusion models have set an impressive milestone in many generation tasks. Trending works such as DALL-E2, Imagen, and Stable Diffusion have attracted great interest in academia and industry. Despite the rapid landscape changes, recent new approaches focus on extensions and performance rather than capacity, thus requiring separate models for separate tasks. In this work, we expand the existing single-flow diffusion pipeline into a multi-flow network, dubbed Versatile Diffusion (VD), that handles text-to-image, image-to-text, image-variation, and text-variation in one unified model. Moreover, we generalize VD to a unified multi-flow multimodal diffusion framework with grouped layers, swappable streams, and other propositions that can process modalities beyond images and text. Through our experiments, we demonstrate that VD and its underlying framework have the following merits: a) VD handles all subtasks with competitive quality; b) VD initiates novel extensions and applications such as disentanglement of style and semantic, image-text dual-guided generation, etc.; c) Through these experiments and applications, VD provides more semantic insights of the generated outputs.*

## Tips

- VersatileDiffusion is conceptually very similar as [Stable Diffusion](./api/pipelines/stable_diffusion), but instead of providing just a image data stream conditioned on text, VersatileDiffusion provides both a image and text data stream and can be conditioned on both text and image.

### *Run VersatileDiffusion*

You can both load the memory intensive "all-in-one" [`VersatileDiffusionPipeline`] that can run all tasks
with the same class as shown in [`VersatileDiffusionPipeline.text_to_image`], [`VersatileDiffusionPipeline.image_variation`], and [`VersatileDiffusionPipeline.dual_guided`]

**or**

You can run the individual pipelines which are much more memory efficient:

- *Text-to-Image*: [`VersatileDiffusionTextToImagePipeline.__call__`]
- *Image Variation*: [`VersatileDiffusionImageVariationPipeline.__call__`]
- *Dual Text and Image Guided Generation*: [`VersatileDiffusionDualGuidedPipeline.__call__`]

### *How to load and use different schedulers.*

The versatile diffusion pipelines uses [`DDIMScheduler`] scheduler by default. But `diffusers` provides many other schedulers that can be used with the alt diffusion pipeline such as [`PNDMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`], [`EulerAncestralDiscreteScheduler`] etc.
To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`] method or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the [`EulerDiscreteScheduler`], you can do the following:

```python
>>> from diffusers import VersatileDiffusionPipeline, EulerDiscreteScheduler

>>> pipeline = VersatileDiffusionPipeline.from_pretrained("shi-labs/versatile-diffusion")
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)

>>> # or
>>> euler_scheduler = EulerDiscreteScheduler.from_pretrained("shi-labs/versatile-diffusion", subfolder="scheduler")
>>> pipeline = VersatileDiffusionPipeline.from_pretrained("shi-labs/versatile-diffusion", scheduler=euler_scheduler)
```

## VersatileDiffusionPipeline
[[autodoc]] VersatileDiffusionPipeline

## VersatileDiffusionTextToImagePipeline
[[autodoc]] VersatileDiffusionTextToImagePipeline
- __call__
- enable_attention_slicing
- disable_attention_slicing

## VersatileDiffusionImageVariationPipeline
[[autodoc]] VersatileDiffusionImageVariationPipeline
- __call__
- enable_attention_slicing
- disable_attention_slicing

## VersatileDiffusionDualGuidedPipeline
[[autodoc]] VersatileDiffusionDualGuidedPipeline
- __call__
- enable_attention_slicing
- disable_attention_slicing
3 changes: 3 additions & 0 deletions docs/source/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ available a colab notebook to directly try them out.
| [stable_diffusion](./api/pipelines/stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
| [stable_diffusion_safe](./api/pipelines/stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb)
| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation |
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation |
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation |
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation |
| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation |

**Note**: Pipelines are simple examples of how to play around with the diffusion systems as described in the corresponding papers.
Loading