Skip to content

Commit 07486a9

Browse files
committed
add
1 parent c36f1c3 commit 07486a9

File tree

1 file changed

+39
-0
lines changed

1 file changed

+39
-0
lines changed

docs/source/en/using-diffusers/kandinsky.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ The Kandinsky models are a series of multilingual text-to-image generation model
2020

2121
[Kandinsky 2.2](../api/pipelines/kandinsky_v22) improves on the previous model by replacing the image encoder of the image prior model with a larger CLIP-ViT-G model to improve quality. The image prior model was also retrained on images with different resolutions and aspect ratios to generate higher-resolution images and different image sizes.
2222

23+
[Kandinsky 3](../api/pipelines/kandinsky3) simplifies the architecture and shifts away from the two-stage generation process involving the prior model and diffusion model. Instead, Kandinsky 3 uses [Flan-UL2](https://huggingface.co/google/flan-ul2) to encode text, a UNet with [BigGan-deep](https://hf.co/papers/1809.11096) blocks, and [Sber-MoVQGAN](https://github.com/ai-forever/MoVQGAN) to decode the latents into images. Text understanding and generated image quality are primarily achieved by using a larger text encoder and UNet.
24+
2325
This guide will show you how to use the Kandinsky models for text-to-image, image-to-image, inpainting, interpolation, and more.
2426

2527
Before you begin, make sure you have the following libraries installed:
@@ -33,6 +35,10 @@ Before you begin, make sure you have the following libraries installed:
3335

3436
Kandinsky 2.1 and 2.2 usage is very similar! The only difference is Kandinsky 2.2 doesn't accept `prompt` as an input when decoding the latents. Instead, Kandinsky 2.2 only accepts `image_embeds` during decoding.
3537

38+
<br>
39+
40+
Kandinsky 3 has a more concise architecture and it doesn't require a prior model. This means it's usage is identical to other diffusion models like [Stable Diffusion XL](sdxl).
41+
3642
</Tip>
3743

3844
## Text-to-image
@@ -91,6 +97,21 @@ image
9197
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinsky-text-to-image.png"/>
9298
</div>
9399

100+
</hfoption>
101+
<hfoption id="Kandinsky 3">
102+
103+
```py
104+
from diffusers import Kandinsky3Pipeline
105+
import torch
106+
107+
pipeline = Kandinsky3Pipeline.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
108+
pipeline.enable_model_cpu_offload()
109+
110+
prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting"
111+
image = pipeline(prompt).images[0]
112+
image
113+
```
114+
94115
</hfoption>
95116
</hfoptions>
96117

@@ -218,6 +239,24 @@ make_image_grid([original_image.resize((512, 512)), image.resize((512, 512))], r
218239
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinsky-image-to-image.png"/>
219240
</div>
220241

242+
</hfoption>
243+
<hfoption id="Kandinsky 3">
244+
245+
```py
246+
from diffusers import Kandinsky3Img2ImgPipeline
247+
from diffusers.utils import load_image
248+
import torch
249+
250+
pipeline = Kandinsky3Img2ImgPipeline.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
251+
pipeline.enable_model_cpu_offload()
252+
253+
prompt = "A fantasy landscape, Cinematic lighting"
254+
image = load_image("https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg")
255+
256+
image = pipeline(prompt, image=image, strength=0.75, num_inference_steps=25).images[0]
257+
image
258+
```
259+
221260
</hfoption>
222261
</hfoptions>
223262

0 commit comments

Comments
 (0)