Skip to content

[LoRA] Discussions on ensuring robust LoRA support in Diffusers #3620

@sayakpaul

Description

@sayakpaul

For the last few months, we have been collaborating with our contributors to ensure we support LoRA effectively and efficiently from Diffusers:

1. Training support

DreamBooth (letting users perform LoRA fine-tuning of both UNet and text-encoder). There were some issues in the text encoder part which are now being fixed in #3437. Thanks to @takuma104.
Vanilla text-to-image fine-tuning. We support only the fine-tuning of UNet with LoRA purposefully since here we'd assume that the number of image-caption pairs is higher than what is typically used for DreamBooth and therefore, text encoder fine-tuning is probably an overkill.

2. Interoperability

With #3437, we're introducing limited support for loading A1111 CivitAI checkpoints with pipeline.load_lora_weights(). This has been a widely requested feature (see #3064 as an example).

We do provide a convert_lora_safetensor_to_diffusers.py script as well that allows for converting A1111 LoRA checkpoints (potentially non-exhaustive) and merging them to the text encoder and the UNet of a DiffusionPipeline. However, this doesn't allow switching the attention processor back to the default one, unlike how it's currently in Diffusers. Check out https://huggingface.co/docs/diffusers/main/en/training/lora for more details. For inference-only and definitive workflows (where one doesn't need to switch attention processors), it caters to many use cases.

3. xformers support for efficient inference

Once LoRA parameters are loaded into a pipeline, xformers should work seamlessly. There was apparently a problem with that and it's fixed in #3556.

4. PT 2.0 SDPA optimization

See: #3594

5. torch.compile() compatibility with LoRA

Once 4. is settled, we should be able to take advantage of torch.compile().

6. Introduction of scale for control the contributions from the text encoder LoRA

See #3480. We already support passing scale as a part of cross_attention_kwargs for the UNet LoRA.

7. Supporting multiple LoRAs

@takuma104 proposed a hook-based design here: #3064 (comment)

I hope this helps to provide a consolidated view of where we're at regarding supporting LoRA from Diffusers.

Cc: @pcuenca @patrickvonplaten

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions