[LoRA] Enabling limited LoRA support for text encoder #2882

sayakpaul · 2023-03-29T07:50:23Z

Potentially closes #2719.

The community has shown that using LoRA for fine-tuning both the UNet and the text encoder while performing DreamBooth-like training has been quite effective.

Diffusers supports LoRA for UNet but not for the text encoder (see #2719 for details).

This PR introduces limited LoRA support for the text encoder using monkey patching. Here's the overall API design:

from diffusers.loaders import TextEncoderLoRAMixin
from transformers import CLIPTextModel

def get_text_encoder():
    return CLIPTextModel.from_pretrained(
        "runwayml/stable-diffusion-v1-5", subfolder="text_encoder"
    )

text_encoder = get_text_encoder()

### Initialization

# Register the `text_encoder` as a class member amnogst other things. 
text_encoder_lora_wrapper = TextEncoderLoRAMixin(text_encoder) 
text_encoder_lora_layers = text_encoder_lora_wrapper.text_encoder_lora_layers

### Perform training of `text_encoder_lora_layers`.

### Save `text_encoder_lora_layers`.
text_encoder_lora_wrapper.save_attn_procs(".", text_encoder_lora_layers)

### Load.
text_encoder = text_encoder_lora_wrapper.load_attn_procs(".")

Gotchas to be aware of:

We should probably not use LoRA on the out projection and the key projection layers when using text encoder. This was used in the original LoRA work. But not sure what the community prefers. I guess we can revisit this if it becomes problematic.
The monkey-patching doesn't use scale used for merging the LoRA parameters with the corresponding text encoder parameters. This is because cross_attention_kwargs wouldn't work with text encoder. If you have any ideas to tackle this issue, let me know.

Note that this PR does not modify the train_dreambooth_lora.py script yet. I want to do that in a separate PR given this PR is merged (after modifications and discussions).

HuggingFaceDocBuilderDev · 2023-03-29T07:55:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

patrickvonplaten · 2023-03-30T11:44:03Z

src/diffusers/loaders.py

+            The text encoder module underlying a [`~DiffusionPipeline`].
+    """
+
+    def __init__(self, text_encoder: nn.Module):


Can we remove the __init__ from the Mixin? I think we could call _initialize_lora_layers() in load_attn_procs no?

I'm not a fan of Mixins having inits because this means they can't be "plugged" into the StableDiffusionPipeline class.

+1, mixins should ideally not have __init__

I don't envision this Mixin to be used with a DiffusionPipeline class.

_initialize_lora_layers() initializes the LoRA parameters, and having it inside load_attn_procs() is not a good choice IMO since syntactically both of them are doing different things.

Disclaimer: I'm here thinking more about inference of text encoder loras then training

I think we should have a function called:

load_lora()

or:

load_lora_weights(...)

That can be called from StableDiffusionPipeline(...)

I don't think we should wrap the text encoder:

text_encoder_lora_wrapper = TextEncoderLoRAMixin(text_encoder)

=> this breaks things for inference
text_encoder_lora_wrapper cannot be passed to the StableDiffusionPipeline because it doesn't have a forward method, it cannot be saved etc...

patrickvonplaten

I like the design a lot - it's super cool that we can re-use the AttnProcsLayers class here. I think it would however be nicer if we don't have to save a new weight file for the TextEncoder.

I'd maybe suggest to:

Call TextEncoderLoRAMixin just LoraLoaderMixin and inside the LoraLoaderMixin assume that the unet has a UNet2DConditionLoadersMixin
Remove the init - I don't think MixinLoaders should have an init(...)
Make sure only one file is saved per LoRA. If someone trains both text encoder and unet lora, we only want to have 1 file to be save IMO.

So IMO we should aim for the following API:

pipe = StableDiffusionPipeline.from_pretrained("...")
pipe.load_lora_weights("path-to-lora")

Now the load_lora_weights is part of the LoraLoaderMixin and it loads the state_dict from "path-to-lora" (either local or Hub or PyTorch state dict). Then passes the unet part of the loaded state dict to UNet2DConditionLoadersMixin.load_attn_procs (Note that this input accepts not just filenames but also PyTorch state_dicts). Then it calls _initialize_lora_layers for the text encoder and finally it loads the text part of the state_dict into the text encoder.

This way we can just plug this Mixin into every pipeline class and don't have to worry about any super().__init__() problems. The only problem here is that the LoraLoaderMixin has to know the name of a) the text encoder and the unet. However we could easily solve this problem with class attributes. E.g. we just give LoraLoaderMixin two class attributes:

class LoRALoderMixin:

text_encoder_name = None
unet_name = None

And those are then overwritten in the StableDiffusionPipeline class (this is a common API that we already use here, e.g.:

diffusers/src/diffusers/models/modeling_utils.py

Line 166 in b202127

config_name = CONFIG_NAME

I think we can just stick to the weight name "pytorch_lora_weights.bin"

Wdyt @sayakpaul ?

patil-suraj

Very cool and great that we can leverage existing API!

+1 for Patrricks suggestion. Agree that having one loader class would be better, and this way, the users won't have to worry about using the mixin.

patil-suraj · 2023-03-30T12:03:44Z

src/diffusers/loaders.py

+            The text encoder module underlying a [`~DiffusionPipeline`].
+    """
+
+    def __init__(self, text_encoder: nn.Module):


+1, mixins should ideally not have __init__

sayakpaul · 2023-03-30T12:20:42Z

Then it calls _initialize_lora_layers for the text encoder and finally it loads the text part of the state_dict into the text encoder.

How are the LoRA layers for the text encoder initialized during the training then? We need to ensure that the users follow the exact approach taken in _initialize_lora_layers() to initialize the LoRA parameters.

pcuenca · 2023-03-30T13:00:53Z

tests/test_text_encoder_lora.py

+            vocab_size=1000,
+        )
+        text_encoder = CLIPTextModel(text_encoder_config).to(torch_device)
+        text_encoder_lora_wrapper = TextEncoderLoRAMixin(copy.deepcopy(text_encoder))


I agree with @patrickvonplaten and @patil-suraj, if this is the way to use the new class then it shouldn't be a mixin.

pcuenca · 2023-03-30T13:04:20Z

Following up on the discussion, I think we may need to have a TextEncoderLoRAMixin (or helper class, if we can't use a mixin) in addition to a pipeline-level mixin that calls both the text encoder and the UNet loaders. Would that work for training @sayakpaul?

patrickvonplaten · 2023-03-30T13:40:05Z

I've mostly thought about inference here - things are indeed a bit tricker for training. I think the important part is the inference part though.

How about the following:

We use this design for inference: [LoRA] Enabling limited LoRA support for text encoder #2882 (review) (IMO that's the only design that works nicely)
For training we now we slightly adapt_initialize_lora_layers to accept a text encoder and return the text encoder lora layers and to make it a class method
Then all we have to do for training is:

from diffusers.loaders import LoRALoaderMixin

...
text_encoder_lora_layers = LoRALoaderMixin.initialize_lora_layers(text_encoder)

=> This would be a pretty nice API also since one only needs to train the LoRA layers and not the whole encoder, so here we've directly seperated trainable weights from non-trainable weights

sayakpaul · 2023-03-30T13:54:08Z

I think the important part is the inference part though.

There's no inference if there's no training. So, I respectfully disagree.

But that said, I really like what you're proposing overall here.

But this introduces a discrepancy between how we initialize the LoRA layers for the UNet and the text encoder:

diffusers/examples/dreambooth/train_dreambooth_lora.py

Lines 714 to 729 in b202127

    
           lora_attn_procs = {} 
        
           for name in unet.attn_processors.keys(): 
        
               cross_attention_dim = None if name.endswith("attn1.processor") else unet.config.cross_attention_dim 
        
               if name.startswith("mid_block"): 
        
                   hidden_size = unet.config.block_out_channels[-1] 
        
               elif name.startswith("up_blocks"): 
        
                   block_id = int(name[len("up_blocks.")]) 
        
                   hidden_size = list(reversed(unet.config.block_out_channels))[block_id] 
        
               elif name.startswith("down_blocks"): 
        
                   block_id = int(name[len("down_blocks.")]) 
        
                   hidden_size = unet.config.block_out_channels[block_id] 
        
               lora_attn_procs[name] = LoRAAttnProcessor(hidden_size=hidden_size, cross_attention_dim=cross_attention_dim) 
        
           unet.set_attn_processor(lora_attn_procs) 
        
           lora_layers = AttnProcsLayers(unet.attn_processors)

For the UNet, the initialization part is handled explicitly, whereas for the text encoder, we're thinking of having a class method. I don't mind having an explicit initialization for the text encoder LoRA layers as well to keep the flow consistent and simple (over easy).

sayakpaul · 2023-03-31T12:03:23Z

Closing this because the conflicts are brutal.

Opened #2918.

sayakpaul added 6 commits March 25, 2023 09:37

improve stable unclip doc.

a009f1d

Merge branch 'main' of https://github.com/huggingface/diffusers

ecf008f

Merge branch 'main' of https://github.com/huggingface/diffusers

01b4d70

initial commits.

c2758e5

add: utilities to support text encoder + LoRA.

75c4601

add: tests.

b500174

sayakpaul requested a review from patrickvonplaten March 29, 2023 07:50

sayakpaul mentioned this pull request Mar 29, 2023

[LoRA] allow fine-tuning of the text encoder with LoRA (using peft) #2719

Closed

sayakpaul added 2 commits March 29, 2023 13:30

add: entry to the docs.

6fcbb5a

fix: tests.

3b65ac1

patrickvonplaten reviewed Mar 30, 2023

View reviewed changes

patrickvonplaten requested a review from pcuenca March 30, 2023 11:55

patil-suraj reviewed Mar 30, 2023

View reviewed changes

pcuenca reviewed Mar 30, 2023

View reviewed changes

patrickvonplaten mentioned this pull request Mar 30, 2023

convert_lora_safetensor_to_diffusers #2829

Closed

sayakpaul mentioned this pull request Mar 31, 2023

[LoRA] Enabling limited LoRA support for text encoder #2918

Merged

2 tasks

sayakpaul closed this Mar 31, 2023

sayakpaul deleted the feat/lora-text-enc branch April 5, 2023 05:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoRA] Enabling limited LoRA support for text encoder #2882

[LoRA] Enabling limited LoRA support for text encoder #2882

Uh oh!

sayakpaul commented Mar 29, 2023 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 29, 2023

Uh oh!

patrickvonplaten Mar 30, 2023

Uh oh!

patil-suraj Mar 30, 2023

Uh oh!

sayakpaul Mar 30, 2023

Uh oh!

patrickvonplaten Mar 30, 2023

Uh oh!

patrickvonplaten left a comment

Uh oh!

patil-suraj left a comment

Uh oh!

patil-suraj Mar 30, 2023

Uh oh!

sayakpaul commented Mar 30, 2023

Uh oh!

pcuenca Mar 30, 2023

Uh oh!

pcuenca commented Mar 30, 2023

Uh oh!

patrickvonplaten commented Mar 30, 2023

Uh oh!

sayakpaul commented Mar 30, 2023

Uh oh!

sayakpaul commented Mar 31, 2023

Uh oh!

Uh oh!

[LoRA] Enabling limited LoRA support for text encoder #2882

[LoRA] Enabling limited LoRA support for text encoder #2882

Uh oh!

Conversation

sayakpaul commented Mar 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 29, 2023

Uh oh!

patrickvonplaten Mar 30, 2023

Choose a reason for hiding this comment

Uh oh!

patil-suraj Mar 30, 2023

Choose a reason for hiding this comment

Uh oh!

sayakpaul Mar 30, 2023

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Mar 30, 2023

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

patil-suraj left a comment

Choose a reason for hiding this comment

Uh oh!

patil-suraj Mar 30, 2023

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Mar 30, 2023

Uh oh!

pcuenca Mar 30, 2023

Choose a reason for hiding this comment

Uh oh!

pcuenca commented Mar 30, 2023

Uh oh!

patrickvonplaten commented Mar 30, 2023

Uh oh!

sayakpaul commented Mar 30, 2023

Uh oh!

sayakpaul commented Mar 31, 2023

Uh oh!

Uh oh!

sayakpaul commented Mar 29, 2023 •

edited

Loading