[Kandinsky 3.0] Follow-up TODOs #5944

yiyixuxu · 2023-11-27T09:34:58Z

work through the remaining TODOs from #5913

Treat all the TODO statemens in the code
Add better docs (added doc for Kandinsky3.0 #5937 (comment))
Add tests for img2img
Rename the pipeline and model files
Clean up all the unet blocks and get rid of hard to read code
Publish on discord etc...

text-2-image

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
        
prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]

image-2-image

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch

pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
        
prompt = "A painting of the inside of a subway train with tiny raccoons."
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky3/t2i.png")

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, image=image, strength=0.75, num_inference_steps=25, generator=generator).images[0]

HuggingFaceDocBuilderDev · 2023-11-27T09:41:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

yiyixuxu · 2023-11-29T11:24:43Z

src/diffusers/models/attention_processor.py

        else:
            baddbmm_input = attention_mask
-            beta = 1
+            beta = self.scale_mask_factor


added a new config for Attention here

set this to be a large negative number helps a lot with numerical stability. in kandinsky they "fill" the empty tokens
in attention_matrix with largest possible negative number(see code ->

diffusers/src/diffusers/models/attention_processor.py

Line 2251 in 5ae3c3a

attention_matrix = attention_matrix.masked_fill(~(context_mask != 0), max_neg_value)

)

I set this config to be -60000.0 for simplicity - not exactly same but seem to be sufficient.

Hmm is beta supposed to be used to control mask precision?

actually, I think I should do this instead!

diffusers/src/diffusers/models/unet_2d_condition.py

Line 896 in d1b2a1a

attention_mask = (1 - attention_mask.to(sample.dtype)) * -10000.0

This beta here is essentially trying to do the same thing - it's trying to zero out the zero token's attention score in the softmax operation. I did not realize I was missing this step because Kandinsky cuts off (most except for one) the zero tokens from prompt_embeds, so not doing this step or doing this step wrong still generates accurate output for the most part, except when batch_size > 1 - in that case the prompt_embeds will contain some zero tokens for shorter prompts and attention_mask needs to be applied correctly

refactored and now this script (one scenario when the attention_mask actually need to be applied) not exactly but similar outputs on main and branch

from diffusers import AutoPipelineForText2Image import torch pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16) pipe.enable_model_cpu_offload() prompt = ["A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background.", "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background. A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."] generator = [torch.Generator(device="cpu").manual_seed(0),torch.Generator(device="cpu").manual_seed(1)] image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]

main branch

patrickvonplaten · 2023-11-29T14:33:12Z

src/diffusers/models/attention_processor.py

        _from_deprecated_attn_block: bool = False,
        processor: Optional["AttnProcessor"] = None,
+        scale_mask_factor: float = 1.0,
+        out_dim: int = None,


Is out_dim different from query_dim here?

@patrickvonplaten
The only difference is the to_outlayer here - Kandinsky attention output does not change the dimension from inner_dim while our attention class will project the output to query_dim. I added an out_dim for this purpose, but we can add a different config if it makes more sense!

diffusers/src/diffusers/models/unet_kandi3.py

Line 453 in d1b2a1a

self.to_out.append(nn.Linear(out_channels, out_channels, bias=False))

That works! Makes sense

patrickvonplaten · 2023-11-29T14:41:42Z

src/diffusers/models/attention_processor.py

        return hidden_states


-# TODO(Yiyi): This class should not exist, we can replace it with a normal attention processor I believe


patrickvonplaten · 2023-11-29T14:42:07Z

src/diffusers/models/unet_2d_condition_kandi3.py

@@ -1,16 +1,28 @@
-import math


Actually can we rename this file to unet_kandinsky3.py ? I don't like kandi.. much

patrickvonplaten · 2023-11-29T14:42:24Z

src/diffusers/models/unet_2d_condition_kandi3.py

-        out = self.attention(out, context, context_mask, image_mask)
+        out = self.attention(out, context, context_mask)
        out = out.permute(0, 2, 1).unsqueeze(-1).reshape(out.shape[0], -1, height, width)
        x = x + out


very nice clean-ups!

patrickvonplaten · 2023-11-29T14:42:34Z

src/diffusers/pipelines/kandinsky3/pipeline_kandinsky3.py

        negative_prompt=None,
        prompt_embeds=None,
        negative_prompt_embeds=None,
+        callback_on_step_end_tensor_inputs=None,


patrickvonplaten · 2023-11-29T14:42:48Z

src/diffusers/pipelines/kandinsky3/pipeline_kandinsky3.py

+            self.maybe_free_model_hooks()

            if not return_dict:
                return (image,)


yiyixuxu · 2023-11-30T04:23:56Z

src/diffusers/pipelines/kandinsky3/pipeline_kandinsky3.py

        generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
        prompt_embeds: Optional[torch.FloatTensor] = None,
        negative_prompt_embeds: Optional[torch.FloatTensor] = None,
+        attention_mask: Optional[torch.FloatTensor] = None,


fixed this bug #5963 (comment) here by adding attention_mask and negative_attention_mask argument to __call__

you should pass the attention_mask, negative_attention_mask along with prompt_embeds and negative_prompt_embeds, otherwise will get an error

from diffusers import AutoPipelineForText2Image import torch pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16) pipe.enable_model_cpu_offload() prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background." prompt_embeds, negative_prompt_embeds, attention_mask, negative_attention_mask = pipe.encode_prompt( prompt, True, device=torch.device("cuda") ) generator = torch.Generator(device="cpu").manual_seed(42) image = pipe(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_prompt_embeds, attention_mask=attention_mask, negative_attention_mask=negative_attention_mask,num_inference_steps=25, generator=generator).images[0]

this works too

image = pipe(prompt_embeds=prompt_embeds, attention_mask=attention_mask,num_inference_steps=25, generator=generator).images[0]

patrickvonplaten · 2023-12-01T15:50:10Z

@yiyixuxu lemme know once ready for a final review :-)

patrickvonplaten

Great clean-up - thanks!

clean-up kendinsky 3.0

remove kandinsky specific attention and attention processor

293c480

yiyixuxu marked this pull request as draft November 27, 2023 09:35

yiyixuxu added 9 commits November 27, 2023 06:50

Merge branch 'main' into kand-3

e5a1f32

remove set_default_layer and set_default_item

eca5c18

remove SinusoidalPosEmb

b8bb288

more

84ce3d6

Merge branch 'main' into kand-3

4b5fe93

disable batch test

123dafc

fix cpu model offload

145ddad

take off last to-do

5a2dd24

Merge branch 'main' into kand-3

3318034

yiyixuxu marked this pull request as ready for review November 29, 2023 04:41

yiyixuxu added 8 commits November 29, 2023 04:44

another to-do

89fdee4

Merge branch 'kand-3' of github.com:huggingface/diffusers into kand-3

d9406cb

refactor

e408cdf

add callback and latent output for text2img

7dced94

refactor img2img

51fe17b

change unet file name

f5cfa5a

add doc string

334cd2e

change pipeline file name

dbf5135

yiyixuxu commented Nov 29, 2023

View reviewed changes

fix failing test

25c4e07

patrickvonplaten reviewed Nov 29, 2023

View reviewed changes

rename unet file

dd198cb

yiyixuxu added 3 commits November 29, 2023 23:30

testing prints

d60bc4e

style

1d170cc

allow pass prompt_embeds

c4eae7e

yiyixuxu commented Nov 30, 2023

View reviewed changes

offload

226f755

patrickvonplaten approved these changes Dec 1, 2023

View reviewed changes

yiyixuxu merged commit b41f809 into main Dec 1, 2023

yiyixuxu deleted the kand-3 branch December 1, 2023 17:14

yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023

[Kandinsky 3.0] Follow-up TODOs (huggingface#5944)

3f49a0a

clean-up kendinsky 3.0

AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024

[Kandinsky 3.0] Follow-up TODOs (huggingface#5944)

3389762

clean-up kendinsky 3.0

		return hidden_states


		# TODO(Yiyi): This class should not exist, we can replace it with a normal attention processor I believe

[Kandinsky 3.0] Follow-up TODOs #5944

[Kandinsky 3.0] Follow-up TODOs #5944

Uh oh!

Conversation

yiyixuxu commented Nov 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

text-2-image

image-2-image

Uh oh!

HuggingFaceDocBuilderDev commented Nov 27, 2023

Uh oh!

yiyixuxu Nov 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Nov 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Nov 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Dec 1, 2023

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yiyixuxu commented Nov 27, 2023 •

edited

Loading

yiyixuxu Nov 29, 2023 •

edited

Loading

yiyixuxu Nov 29, 2023 •

edited

Loading

yiyixuxu Nov 29, 2023 •

edited

Loading