-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Add draft for lora text encoder scale #3626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The documentation is not available anymore as the PR was closed or merged. |
src/diffusers/loaders.py
Outdated
@@ -839,6 +839,9 @@ def load_lora_weights(self, pretrained_model_name_or_path_or_dict: Union[str, Di | |||
weight_name = kwargs.pop("weight_name", None) | |||
use_safetensors = kwargs.pop("use_safetensors", None) | |||
|
|||
# set lora scale to a reasonable default | |||
self._scale = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we'd want the users to also specify this explicitly if they want to, no?
For that, don't you think exposing an argument for it makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, actually users should never set them argument themselves (which is why it's marked private with a _
). This should just be used to make it work with LoRA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we do allow adjusting the scale of the UNet LoRA via cross_attention_kwargs
. Refer to this doc:
https://huggingface.co/docs/diffusers/main/en/training/lora
By default, we already use the scale
argument (with a value of 1) here:
self, attn: Attention, hidden_states, encoder_hidden_states=None, attention_mask=None, scale=1.0, temb=None |
By exposing the scale arguments to the users, they have explicit control over how they want to control the effect.
Let me know if I am missing out on something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a misunderstanding here, let's maybe take this offline in a quick call next week :-)
@sayakpaul the purpose of this PR is to create a mechanism that allows the user to change the lora scale of the text encoder. I've adapted the PR to make it functional and used it with your example of a1111 here: #!/usr/bin/env python3
from diffusers import StableDiffusionPipeline, KDPM2DiscreteScheduler, StableDiffusionImg2ImgPipeline, HeunDiscreteScheduler, KDPM2AncestralDiscreteScheduler, DDIMScheduler, DPMSolverMultistepScheduler
import time
import os
import torch
path = "gsdf/Counterfeit-V2.5"
pipe = StableDiffusionPipeline.from_pretrained(path, torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
pipe.scheduler.config, use_karras_sigmas=True
)
pipe = pipe.to("cuda")
pipe.load_lora_weights(".", weight_name="light_and_shadow.safetensors")
prompt = "masterpiece, best quality, 1girl, at dusk"
negative_prompt = ("(low quality, worst quality:1.4), (bad anatomy), (inaccurate limb:1.2), "
"bad composition, inaccurate eyes, extra digit, fewer digits, (extra arms:1.2), large breasts")
pipe.enable_xformers_memory_efficient_attention()
images = pipe(prompt=prompt,
negative_prompt=negative_prompt,
width=512,
height=768,
num_inference_steps=15,
num_images_per_prompt=4,
cross_attention_kwargs={"scale": 0.5},
generator=torch.manual_seed(0)
).images Note how |
But now, we are assuming users will always want to use the same P.S.: I was (from day one) clear about the scope of the PR. But from your descriptions, I was not sure how you were thinking about how users would pass the |
If your LoRA parameters involve the UNet as well as the Text Encoder, then passing | ||
`cross_attention_kwargs={"scale": 0.5}` will apply the `scale` value to both the UNet | ||
and the Text Encoder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -870,6 +870,9 @@ def main(args): | |||
temp_pipeline = DiffusionPipeline.from_pretrained( | |||
args.pretrained_model_name_or_path, text_encoder=text_encoder | |||
) | |||
# Setting the `_lora_scale` explicitly because we are not using | |||
# `load_lora_weights()`. | |||
temp_pipeline._lora_scale = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, forward pass with the text encoder won't be possible.
…ace/diffusers into lora_text_encoder_scale
@sayakpaul, I've now added the LoRALoader to all models where IMO it makes sense. For some pipelines like pix2pix_zero it didn't make much sense so I just added an import so that the copied from still works. IMO it's more important that code stays in sync than to have 100% clean code for this edge case. Will merge now so that we have results on the slow tests tomorrow |
* Add draft for lora text encoder scale * Improve naming * fix: training dreambooth lora script. * Apply suggestions from code review * Update examples/dreambooth/train_dreambooth_lora.py * Apply suggestions from code review * Apply suggestions from code review * add lora mixin when fit * add lora mixin when fit * add lora mixin when fit * fix more * fix more --------- Co-authored-by: Sayak Paul <[email protected]>
* Add draft for lora text encoder scale * Improve naming * fix: training dreambooth lora script. * Apply suggestions from code review * Update examples/dreambooth/train_dreambooth_lora.py * Apply suggestions from code review * Apply suggestions from code review * add lora mixin when fit * add lora mixin when fit * add lora mixin when fit * fix more * fix more --------- Co-authored-by: Sayak Paul <[email protected]>
Draft PR to show how we could correctly deal with LoRA scale for the text encoder