-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Skip PEFT LoRA Scaling if the scale is 1.0 #7576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip PEFT LoRA Scaling if the scale is 1.0 #7576
Conversation
|
Thanks for your PR. Could you quantify the time difference? Cc: @BenjaminBossan here. |
|
Thanks for investigating this issue. Indeed, scaling is unnecessary if the scale is 1 -- in fact, we already have a check in PEFT that skips the scaling in that case. The issue seems to be the looping over the modules and the My suggestion for this PR, however, is to move the skipping logic inside of Of course, this adds another function call to the stack, but that should be very negligible overall.
If I reed the graph directly, the difference is from 4.6 sec to 2.3 sec. |
|
Thanks @BenjaminBossan for your comments. @stevenjlm I will let you address the comments and we can take it from there. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@BenjaminBossan @sayakpaul thanks for the feedback and guidance! I will move the logic inside |
…/stevenjlm/diffusers into skip-lora-scale-if-not-necessary
|
@BenjaminBossan @sayakpaul Moved the logic inside |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for digging in and working on this performance improvement.
Could you please fix the code quality issues and submit again?
| new_module = torch.nn.Linear(module.in_features, module.out_features, bias=module.bias is not None).to( | ||
| module.weight.device | ||
| ) | ||
| new_module = torch.nn.Linear( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the formatter, when I ran make style it made this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you prefer if I undo it?
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much. Just one comment. But I think we’re good to go.
|
@stevenjlm could you push an empty commit on your end? I think the failing test is unrelated. |
…/stevenjlm/diffusers into skip-lora-scale-if-not-necessary
|
Pushed an empty commit, hopefully workflows pass. @sayakpaul |
|
I'm looking into this failing test to see if there's anything I can do to fix it.. |
…/stevenjlm/diffusers into skip-lora-scale-if-not-necessary
|
@sayakpaul I see that @yiyixuxu commented out the test that was failing in #7620 so checks should pass now that I rebased. |
|
Thanks! Please tag me once the CI run is complete. |
|
@sayakpaul CI run passed! |
|
Thanks for this cool contribution! |
* Skip scaling if scale is identity * move check for weight one to scale and unscale lora * fix code style/quality * Empty-Commit --------- Co-authored-by: Steven Munn <[email protected]> Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: Steven Munn <[email protected]>
What does this PR do?
This PR skips scaling LoRA modules during the forward unet step if the lora scale is 1.0 (and thus will have no effect downstream). In profiling tests, I have found that for SDXL loaded with LoRAs, a substantial amount of inference times is spent looping through modules in the
scale_lora_layersandunscale_lora_layersmethods. If the LoRA scale is 1.0, this loop will have no effect and we might as well skip it.There are additional details on this at the bottom of this description in the "performance details" section.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
(It's a small enough change, I'm not sure it warrants doc or test updates, but I'll be happy to if requested.)
Who can review?
@sayakpaul @yiyixuxu @DN6
Performance Details
Below are the results from using cProfile, and at the bottom is a minimal code snippet I used for these profiles.
Profiler before code change:


After code change:
And output looks similar.