Skip PEFT LoRA Scaling if the scale is 1.0 #7576

stevenjlm · 2024-04-04T16:43:33Z

What does this PR do?

This PR skips scaling LoRA modules during the forward unet step if the lora scale is 1.0 (and thus will have no effect downstream). In profiling tests, I have found that for SDXL loaded with LoRAs, a substantial amount of inference times is spent looping through modules in the scale_lora_layers and unscale_lora_layers methods. If the LoRA scale is 1.0, this loop will have no effect and we might as well skip it.

There are additional details on this at the bottom of this description in the "performance details" section.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

(It's a small enough change, I'm not sure it warrants doc or test updates, but I'll be happy to if requested.)

Who can review?

@sayakpaul @yiyixuxu @DN6

Performance Details

Below are the results from using cProfile, and at the bottom is a minimal code snippet I used for these profiles.

Profiler before code change:

After code change:

And output looks similar.

from cProfile import Profile
from datetime import datetime
from io import BytesIO

import requests
import torch
from diffusers import DPMSolverMultistepScheduler, StableDiffusionXLImg2ImgPipeline
from PIL import Image
from progressbar import progressbar

profiler = Profile()

MODEL_CACHE = "diffusers-cache"
FUSE = True
lora_ids = {
    "ikea": "ostris/ikea-instructions-lora-sdxl",
}

lora_keywords = {
    "ikea": "ikea",
}

# ------------------------------------- Load base model
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    # "stablediffusionapi/juggernaut-xl-v8",
    cache_dir=MODEL_CACHE,
    torch_dtype=torch.float16,
).to("cuda")

# ------------------------------------- Load Loras
for lora_name, lora_id in progressbar(lora_ids.items()):
    state_dict, _ = pipe.lora_state_dict(
        lora_id,
        unet_config=pipe.unet.config,
    )
    pipe.load_lora_weights(
        state_dict,
        unet=pipe.unet,
        adapter_name=lora_name,
    )

# ------------------------------------- Run Inference
profiler.enable()
in_url = "https://media.cnn.com/api/v1/images/stellar/prod/200605082916-01-real-tiger-king-siberian.jpg"
prompt = "tiger"
lora_name = "ikea"
num_samples = 1

if FUSE:
    pipe.fuse_lora()

pipe.scheduler = DPMSolverMultistepScheduler(use_karras_sigmas=True, algorithm_type="dpmsolver++")
pipe.set_adapters(lora_name)
pipe.fuse_lora()
response = requests.get(in_url)
og_image = Image.open(BytesIO(response.content))

output = pipe(
    image=og_image.convert("RGB"),
    prompt=[lora_keywords[lora_name]] * num_samples,
    num_inference_steps=20,
    generator=torch.Generator("cuda").manual_seed(42),
    strength=0.5,
)

if FUSE:
    pipe.unfuse_lora()

all_images = output.images
output_paths = []

profiler.disable()
profiler.dump_stats(
    f"profile-{datetime.now().strftime('%Y%m%d-%H%M%S')}.prof"
)

for i, sample in enumerate(all_images):
    output_path = f"out-{i}.png"
    sample.save(output_path)

sayakpaul · 2024-04-05T02:46:41Z

Thanks for your PR. Could you quantify the time difference?

Cc: @BenjaminBossan here.

BenjaminBossan · 2024-04-05T10:55:03Z

Thanks for investigating this issue. Indeed, scaling is unnecessary if the scale is 1 -- in fact, we already have a check in PEFT that skips the scaling in that case. The issue seems to be the looping over the modules and the isinstance check for each module, which, as correctly stated, can be skipped.

My suggestion for this PR, however, is to move the skipping logic inside of scale_lora_layers and unscale_lora_layers. The reason is that if those functions are called elsewhere, all these callers benefit from the optimization. Otherwise, they would each need to perform the same check.

Of course, this adds another function call to the stack, but that should be very negligible overall.

Thanks for your PR. Could you quantify the time difference?

If I reed the graph directly, the difference is from 4.6 sec to 2.3 sec.

sayakpaul · 2024-04-05T11:52:27Z

Thanks @BenjaminBossan for your comments.

@stevenjlm I will let you address the comments and we can take it from there.

HuggingFaceDocBuilderDev · 2024-04-05T11:59:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

stevenjlm · 2024-04-05T13:54:01Z

@BenjaminBossan @sayakpaul thanks for the feedback and guidance! I will move the logic inside scale_lora_layers and unscale_lora_layers.

…/stevenjlm/diffusers into skip-lora-scale-if-not-necessary

stevenjlm · 2024-04-05T14:52:07Z

@BenjaminBossan @sayakpaul Moved the logic inside scale_lora_layers and unscale_lora_layers. I also redid the performance check. As expected, the improvement is very similar: It goes from ~4.6 seconds without the changes down to ~2.3 seconds with the changes.

BenjaminBossan

LGTM, thanks for digging in and working on this performance improvement.

Could you please fix the code quality issues and submit again?

sayakpaul · 2024-04-05T15:57:11Z

src/diffusers/utils/peft_utils.py

-                new_module = torch.nn.Linear(module.in_features, module.out_features, bias=module.bias is not None).to(
-                    module.weight.device
-                )
+                new_module = torch.nn.Linear(


Why this change?

It's the formatter, when I ran make style it made this change

Do you prefer if I undo it?

sayakpaul

Thanks very much. Just one comment. But I think we’re good to go.

sayakpaul · 2024-04-06T02:36:55Z

@stevenjlm could you push an empty commit on your end? I think the failing test is unrelated.

…/stevenjlm/diffusers into skip-lora-scale-if-not-necessary

stevenjlm · 2024-04-08T20:41:13Z

Pushed an empty commit, hopefully workflows pass. @sayakpaul

stevenjlm · 2024-04-10T14:55:19Z

I'm looking into this failing test to see if there's anything I can do to fix it..

…/stevenjlm/diffusers into skip-lora-scale-if-not-necessary

stevenjlm · 2024-04-10T15:02:55Z

@sayakpaul I see that @yiyixuxu commented out the test that was failing in #7620 so checks should pass now that I rebased.

sayakpaul · 2024-04-11T03:24:18Z

Thanks! Please tag me once the CI run is complete.

stevenjlm · 2024-04-11T04:46:11Z

@sayakpaul CI run passed!

sayakpaul · 2024-04-11T05:32:52Z

Thanks for this cool contribution!

* Skip scaling if scale is identity * move check for weight one to scale and unscale lora * fix code style/quality * Empty-Commit --------- Co-authored-by: Steven Munn <[email protected]> Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: Steven Munn <[email protected]>

Skip scaling if scale is identity

52e606d

Merge branch 'main' into skip-lora-scale-if-not-necessary

2a38d13

Steven Munn added 2 commits April 5, 2024 07:44

move check for weight one to scale and unscale lora

800c719

Merge branch 'skip-lora-scale-if-not-necessary' of https://github.com…

b96f421

…/stevenjlm/diffusers into skip-lora-scale-if-not-necessary

BenjaminBossan approved these changes Apr 5, 2024

View reviewed changes

fix code style/quality

bba7838

sayakpaul reviewed Apr 5, 2024

View reviewed changes

sayakpaul requested changes Apr 5, 2024

View reviewed changes

Merge branch 'main' into skip-lora-scale-if-not-necessary

b19838f

sayakpaul approved these changes Apr 6, 2024

View reviewed changes

stevenjlm added 2 commits April 5, 2024 19:45

Empty-Commit

067eb8e

Merge branch 'skip-lora-scale-if-not-necessary' of https://github.com…

bfdac24

…/stevenjlm/diffusers into skip-lora-scale-if-not-necessary

Merge branch 'main' into skip-lora-scale-if-not-necessary

d573f2a

stevenjlm added 2 commits April 10, 2024 07:59

Merge branch 'main' into skip-lora-scale-if-not-necessary

daf9557

Merge branch 'skip-lora-scale-if-not-necessary' of https://github.com…

1041582

…/stevenjlm/diffusers into skip-lora-scale-if-not-necessary

Merge branch 'main' into skip-lora-scale-if-not-necessary

ae2910a

sayakpaul merged commit 42f25d6 into huggingface:main Apr 11, 2024

Skip PEFT LoRA Scaling if the scale is 1.0 #7576

Skip PEFT LoRA Scaling if the scale is 1.0 #7576

Uh oh!

Conversation

stevenjlm commented Apr 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Performance Details

Uh oh!

sayakpaul commented Apr 5, 2024

Uh oh!

BenjaminBossan commented Apr 5, 2024

Uh oh!

sayakpaul commented Apr 5, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Apr 5, 2024

Uh oh!

stevenjlm commented Apr 5, 2024

Uh oh!

stevenjlm commented Apr 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul Apr 5, 2024

Choose a reason for hiding this comment

Uh oh!

stevenjlm Apr 5, 2024

Choose a reason for hiding this comment

Uh oh!

stevenjlm Apr 5, 2024

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Apr 6, 2024

Uh oh!

stevenjlm commented Apr 8, 2024

Uh oh!

stevenjlm commented Apr 10, 2024

Uh oh!

stevenjlm commented Apr 10, 2024

Uh oh!

sayakpaul commented Apr 11, 2024

Uh oh!

stevenjlm commented Apr 11, 2024

Uh oh!

sayakpaul commented Apr 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stevenjlm commented Apr 4, 2024 •

edited

Loading

stevenjlm commented Apr 5, 2024 •

edited

Loading