Fix to generate one LoRAAttnProcessor for each CLIPAttention in TextEncoder LoRA #3505

takuma104 · 2023-05-22T16:34:19Z

What's this?

This PR will fix the creation method of the LoRAAttnProcessor in text_encoder, and change it so that one LoRAAttnProcessor is created per CLIPAttention. Change the comment part of the following code to the line below. The current Diffusers code generates four LoRAAttnProcessor for each CLIPAttention, as TEXT_ENCODER_TARGET_MODULES has four keys.

for name, module in text_encoder.named_modules():
    # if any(x in name for x in TEXT_ENCODER_TARGET_MODULES): # current Diffusers
    if name.endswith('self_attn'): # this PR
        print(name)
        text_lora_attn_procs[name] = LoRAAttnProcessor(
            hidden_size=module.out_proj.out_features, cross_attention_dim=None
        )

self_attn is the CLIPAttention class. This is equivalent to the Attention class in Diffusers. I thought it would be appropriate to generate LoRAAttnProcessor in a 1:1 relationship with this. See also: #3437 (comment)

Todo:

- [ ] This PR changes the key of the weight. Compatibility issues arise with already trained checkpoints with the --train_text_encoder option. To accommodate this, handling in the loader will be necessary.

Note:

This PR currently branches from the working branch of #3490 . Therefore, the File Changed tab also includes the changes from #3490 . Please refer this diff for this PR unique commits.

HuggingFaceDocBuilderDev · 2023-05-22T16:40:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

takuma104 · 2023-05-22T17:30:07Z

I conducted qualitative tests using the dreambooth/lora script. The script I used is as follows.

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="lora-trained"

accelerate launch ../examples/dreambooth/train_dreambooth_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a photo of sks dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --checkpointing_steps=100 \
  --learning_rate=1e-4 \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=50 \
  --train_text_encoder \
  --seed="0" \
  --push_to_hub

The result checkpoint:
https://huggingface.co/takuma104/lora-test-text-encoder-lora-target

The following is dumped by https://gist.github.com/takuma104/dd7855909af17f3792b3704578c63a26 It seems that some learning has progressed since to_*_lora.up.weight are non-zero.

text_encoder.text_model.encoder.layers.0.self_attn.to_k_lora.down.weight [4, 768] mean=0.000592 std=0.248
text_encoder.text_model.encoder.layers.0.self_attn.to_k_lora.up.weight [768, 4] mean=-0.000118 std=0.00203
text_encoder.text_model.encoder.layers.0.self_attn.to_out_lora.down.weight [4, 768] mean=0.00265 std=0.248
text_encoder.text_model.encoder.layers.0.self_attn.to_out_lora.up.weight [768, 4] mean=7.15e-07 std=0.00217
text_encoder.text_model.encoder.layers.0.self_attn.to_q_lora.down.weight [4, 768] mean=0.00118 std=0.252
text_encoder.text_model.encoder.layers.0.self_attn.to_q_lora.up.weight [768, 4] mean=1.87e-05 std=0.00192
text_encoder.text_model.encoder.layers.0.self_attn.to_v_lora.down.weight [4, 768] mean=0.00242 std=0.251
text_encoder.text_model.encoder.layers.0.self_attn.to_v_lora.up.weight [768, 4] mean=-7.65e-06 std=0.00224

patrickvonplaten · 2023-05-22T18:52:03Z

Nice, I think this is the missing piece here - great find @takuma104! @sayakpaul can you have a look here?

src/diffusers/loaders.py

sayakpaul · 2023-05-23T04:37:19Z

src/diffusers/loaders.py

@@ -943,14 +943,19 @@ def _modify_text_encoder(self, attn_processors: Dict[str, LoRAAttnProcessor]):
                module = self.text_encoder.get_submodule(name)
                # Construct a new function that performs the LoRA merging. We will monkey patch
                # this forward pass.
-                lora_layer = getattr(attn_processors[name], self._get_lora_layer_attribute(name))
+                attn_processor_name = ".".join(name.split(".")[:-1])


To get the correct mapping in the names as discovered in #3437 (comment)

sayakpaul · 2023-05-23T04:43:49Z

examples/dreambooth/train_dreambooth_lora.py

                text_lora_attn_procs[name] = LoRAAttnProcessor(
-                    hidden_size=module.out_features, cross_attention_dim=None
+                    hidden_size=module.out_proj.out_features, cross_attention_dim=None


Looks good to me!

However, does the following need to be changed since that LoRA layer mapping is now being changed?

diffusers/src/diffusers/loaders.py

Line 74 in 2f997f3

self.split_keys = [".processor", ".k_proj", ".q_proj", ".v_proj", ".out_proj"]

Cc: @patrickvonplaten do we need to change the split keys here for the text encoder since https://github.com/huggingface/diffusers/pull/3505/files#diff-eca8763d65ac395bf286dd84b25abcf5e299a737508155270a648534d781e865R34

Nice catch! The current code becomes a problem when the remap_key() function is called, so I've fixed it 160a4d3. However, it seems that the current test code and dreambooth script do not fall into the conditions where this remap_key() function is called, so I haven't been able to test it.

As far as I understand, the intention here is that this part is called when AttnProcsLayers is directly loaded with load_state_dict(), but are there any use cases where this part is called?

Thanks! Yeah I think so too. @patrickvonplaten could you confirm once?

sayakpaul · 2023-05-23T04:45:43Z

tests/models/test_lora_layers.py

+    return text_lora_attn_procs
+
+
+def create_text_encoder_lora_layers(text_encoder: nn.Module):


Seems like we're not using this method. Okay for me to discard.

I've left this as it still seems to be used inside LoraLoaderMixinTests.get_dummy_components().

sayakpaul · 2023-05-23T05:03:06Z

I think we can merge this once #3505 (comment) is addressed. I think then we can also merge #3490 as is no?

sayakpaul · 2023-05-23T05:53:49Z

An inspiring update: #3437 (comment).

@takuma104 I think for the utilities, we can sum everything up (this PR, #3437, #3490) this week.

Let me know anything that needs attention, testing, etc.

takuma104 · 2023-05-23T16:04:40Z

@sayakpaul I have made fixes regarding #3505 (comment). (As I commented, I have not been able to test it)

rvorias · 2023-05-24T18:52:09Z

Ran some finetunes with these PR changes. Qualitatively, results are looking promising. Thanks for the efforts!

sayakpaul · 2023-05-26T12:07:29Z

Closing in favor of #3437.

takuma104 added 9 commits May 19, 2023 22:27

fix monkey-patch for text_encoder

fb708fb

add test_text_encoder_lora_monkey_patch()

6e8f3ab

verify that it's okay to release the attn_procs

8511755

fix closure version

81915f4

add comment

88db546

Fix to reuse utility functions

1da772b

Merge branch 'huggingface:main' into text-encoder-lora-monkeypatch

8c0926c

make LoRAAttnProcessor targets to self_attn

8a26848

fix LoRAAttnProcessor target

28c69ee

takuma104 mentioned this pull request May 22, 2023

Fix monkey-patch for text_encoder LoRA #3490

Closed

make style

3a74c7e

patrickvonplaten requested a review from sayakpaul May 22, 2023 18:52

sayakpaul reviewed May 23, 2023

View reviewed changes

src/diffusers/loaders.py Outdated Show resolved Hide resolved

sayakpaul reviewed May 23, 2023

View reviewed changes

sayakpaul mentioned this pull request May 23, 2023

Support Kohya-ss style LoRA file format (in a limited capacity) #3437

Merged

3 tasks

takuma104 added 2 commits May 24, 2023 00:47

fix split key

160a4d3

Update src/diffusers/loaders.py

f14329d

takuma104 marked this pull request as ready for review May 24, 2023 15:31

takuma104 closed this May 29, 2023

		return text_lora_attn_procs


		def create_text_encoder_lora_layers(text_encoder: nn.Module):

Fix to generate one LoRAAttnProcessor for each CLIPAttention in TextEncoder LoRA #3505

Fix to generate one LoRAAttnProcessor for each CLIPAttention in TextEncoder LoRA #3505

Uh oh!

Conversation

takuma104 commented May 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's this?

Todo:

Note:

Uh oh!

HuggingFaceDocBuilderDev commented May 22, 2023

Uh oh!

takuma104 commented May 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented May 22, 2023

Uh oh!

Uh oh!

sayakpaul May 23, 2023

Choose a reason for hiding this comment

Uh oh!

sayakpaul May 23, 2023

Choose a reason for hiding this comment

Uh oh!

sayakpaul May 23, 2023

Choose a reason for hiding this comment

Uh oh!

takuma104 May 23, 2023

Choose a reason for hiding this comment

Uh oh!

sayakpaul May 24, 2023

Choose a reason for hiding this comment

Uh oh!

sayakpaul May 23, 2023

Choose a reason for hiding this comment

Uh oh!

takuma104 May 23, 2023

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented May 23, 2023

Uh oh!

sayakpaul commented May 23, 2023

Uh oh!

takuma104 commented May 23, 2023

Uh oh!

rvorias commented May 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented May 26, 2023

Uh oh!

Uh oh!

takuma104 commented May 22, 2023 •

edited

Loading

takuma104 commented May 22, 2023 •

edited

Loading

rvorias commented May 24, 2023 •

edited

Loading