Skip to content

Conversation

@sayakpaul
Copy link
Member

What does this PR do?

Fixes: #6086

pipeline = pipeline.to(accelerator.device)
# Final inference
# Load previous pipeline
if args.validation_prompt is not None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not validation_prompt was passed we must not run this step.

# load attention processors
pipeline.unet.load_attn_procs(args.output_dir)
# load attention processors
pipeline.load_lora_weights(args.output_dir)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to use load_lora_weights() instead of load_attn_procs().

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sr5434
Copy link

sr5434 commented Dec 14, 2023

@sayakpaul I am getting this error for regular LoRA fine-tune:

Steps:   0% 0/1500 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/content/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py", line 967, in <module>
    main()
  File "/content/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py", line 774, in main
    model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 680, in forward
    return model_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 668, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/unet_2d_condition.py", line 1004, in forward
    if "text_embeds" not in added_cond_kwargs:
TypeError: argument of type 'NoneType' is not iterable

This is in a free GPU enabled Google Colab

@sayakpaul
Copy link
Member Author

I am not sure what script you're using here.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense thanks! in the future we could also expose a method on PEFT to upcast trainable params in fp32 ! cc @BenjaminBossan @pacman100 similarly as prepare_model_for_kbit_training

@BenjaminBossan
Copy link
Member

Makes sense thanks! in the future we could also expose a method on PEFT to upcast trainable params in fp32 ! cc @BenjaminBossan @pacman100 similarly as prepare_model_for_kbit_training

Yes, for sure, this isn't the first time this came up. Do we know exactly when this condition appears? Is it only when the user explicitly loads a model in float16? If yes, we may want to add a corresponding check to this PR.

@younesbelkada
Copy link
Contributor

Is it only when the user explicitly loads a model in float16?

@sayakpaul can confirm but I think that's the case right ?

@sayakpaul
Copy link
Member Author

Is it only when the user explicitly loads a model in float16?

@sayakpaul can confirm but I think that's the case right ?

Indeed that's the case. Only reduced precisions.

@patrickvonplaten
Copy link
Contributor

@patil-suraj @williamberman can you please also take a look here?

@sr5434
Copy link

sr5434 commented Dec 16, 2023

I am not sure what script you're using here.

I am using the train_text_to_image_lora_sdxl.py script

@BenjaminBossan
Copy link
Member

Indeed that's the case. Only reduced precisions.

Does this also apply to bf16? If not, I think the dtype conversion should be conditional, i.e. if args.mixed_precision == "fp16".

@sayakpaul
Copy link
Member Author

@BenjaminBossan done in 8ac462b.

@patrickvonplaten
Copy link
Contributor

Hmm, but now we're just silently disabling fp16 training - didn't this work before (e.g. that the whole UNet is kept in fp16 when LoRA is trained). Why doesn't it work anymore?

@patrickvonplaten
Copy link
Contributor

The problem here is the following IMO:

  • We move both LoRA weights and non-LoRA weights to fp16 before training; then in mixed precision training the LoRA weights should not be in FP16 and thus an error is thrown
  • IMO, the solution should not be to move all weights (including non-trainable weights) to full fp32, instead we should only move the trainable LoRA weights to fp32 and keep the rest in fp16 to not blow up memory

@younesbelkada
Copy link
Contributor

younesbelkada commented Dec 18, 2023

the changes proposed only upcasts the LoRA in fp32 (with the check requires_grad. When you inject adapters the non-LoRA weights will have requires_grad set to False) and before the PEFT integration all LoRA layers were in fp32 because the arg dtype was never used in the example scripts:

self.down = nn.Linear(in_features, rank, bias=False, device=device, dtype=dtype)

    for attn_processor_name, attn_processor in unet.attn_processors.items():
        # Parse the attention module.
        attn_module = unet
        for n in attn_processor_name.split(".")[:-1]:
            attn_module = getattr(attn_module, n)

        # Set the `lora_layer` attribute of the attention-related matrices.
        attn_module.to_q.set_lora_layer(
            LoRALinearLayer(
                in_features=attn_module.to_q.in_features, out_features=attn_module.to_q.out_features, rank=args.rank
            )
        )
        attn_module.to_k.set_lora_layer(
            LoRALinearLayer(
                in_features=attn_module.to_k.in_features, out_features=attn_module.to_k.out_features, rank=args.rank
            )
        )
        attn_module.to_v.set_lora_layer(
            LoRALinearLayer(
                in_features=attn_module.to_v.in_features, out_features=attn_module.to_v.out_features, rank=args.rank
            )
        )

@younesbelkada
Copy link
Contributor

younesbelkada commented Dec 18, 2023

one cleaner check could be to check if the module is an instance of BaseTunerLayer and upcast it only if that's the case

@patrickvonplaten
Copy link
Contributor

the changes proposed only upcasts the LoRA in fp32 (with the check requires_grad. When you inject adapters the non-LoRA weights will have requires_grad set to False) and before the PEFT integration all LoRA layers were in fp32 because the arg dtype was never used in the example scripts:

self.down = nn.Linear(in_features, rank, bias=False, device=device, dtype=dtype)

    for attn_processor_name, attn_processor in unet.attn_processors.items():
        # Parse the attention module.
        attn_module = unet
        for n in attn_processor_name.split(".")[:-1]:
            attn_module = getattr(attn_module, n)

        # Set the `lora_layer` attribute of the attention-related matrices.
        attn_module.to_q.set_lora_layer(
            LoRALinearLayer(
                in_features=attn_module.to_q.in_features, out_features=attn_module.to_q.out_features, rank=args.rank
            )
        )
        attn_module.to_k.set_lora_layer(
            LoRALinearLayer(
                in_features=attn_module.to_k.in_features, out_features=attn_module.to_k.out_features, rank=args.rank
            )
        )
        attn_module.to_v.set_lora_layer(
            LoRALinearLayer(
                in_features=attn_module.to_v.in_features, out_features=attn_module.to_v.out_features, rank=args.rank
            )
        )

I see that makes sense! Thanks for the explanation

@sayakpaul
Copy link
Member Author

I pulled in the changes from this PR and added to #6225.

         text_encoder_one.add_adapter(text_lora_config)
         text_encoder_two.add_adapter(text_lora_config)

+    # Make sure the trainable params are in float32.
+    if args.mixed_precision == "fp16":
+        models = [unet]
+        if args.train_text_encoder:
+            models.extend([text_encoder_one, text_encoder_two])
+        for model in models:
+            for param in model.parameters():
+                # only upcast trainable parameters (LoRA) into fp32
+                if param.requires_grad:
+                    param.data = param.to(torch.float32)
+
     # create custom saving & loading hooks so that `accelerator.save_state(...)` serializes in a nice format
     def save_model_hook(models, weights, output_dir):
         if accelerator.is_main_process:

I can confirm that things are working well: https://wandb.ai/sayakpaul/dreambooth-lora-sd-xl/runs/ow1vrez8. See the "test" media pictures.

Command I ran:

CUDA_VISIBLE_DEVICES=1 accelerate launch train_with_fixes.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
  --pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" \
  --instance_data_dir="dog" \
  --output_dir="corgy_dog_LoRA" \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of TOK dog" \
  --resolution=1024 \
  --train_batch_size=4 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --snr_gamma=5.0 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --mixed_precision="fp16" \
  --use_8bit_adam \
  --max_train_steps=500 \
  --checkpointing_steps=100 \
  --push_to_hub \
  --validation_prompt="a photo of TOK dog in a bucket at the beach" \
  --report_to="wandb" \
  --seed="0"

Trained model: https://huggingface.co/sayakpaul/corgy_dog_LoRA. I am gonna try to run using Colab free tier too and report back here.

@sayakpaul
Copy link
Member Author

To the ones wondering if this stuff would run on free-tier Colab Notebook, https://colab.research.google.com/gist/sayakpaul/9615b89369f3ef23cc29d0dac58253dd/scratchpad.ipynb should clear all the doubts once and for all 💪

@sayakpaul sayakpaul merged commit 288ceeb into main Dec 19, 2023
@sayakpaul sayakpaul deleted the fix/lora-training branch December 19, 2023 04:24
@yashveer08
Copy link

To the ones wondering if this stuff would run on free-tier Colab Notebook, https://colab.research.google.com/gist/sayakpaul/9615b89369f3ef23cc29d0dac58253dd/scratchpad.ipynb should clear all the doubts once and for all 💪

This seems to be working, but when added the metadata.jsonl file, the dataset library is causing the issue.
It shows the below error.
can you cross check your colab with the caption file @sayakpaul , it would be great help.
Screenshot 2023-12-19 at 12 13 26 PM

@sayakpaul
Copy link
Member Author

That is an unrelated problem and you should instead file this in the datasets repo.

@yashveer08
Copy link

That is an unrelated problem and you should instead file this in the datasets repo.

Sure will do, but is there a way apart from this to train an SDXL model with captions for each image, alternative to this?
@sayakpaul

@sayakpaul
Copy link
Member Author

sayakpaul commented Dec 21, 2023

This you will have to debug your way through, cause it's not exactly the same code that you're using.

donhardman pushed a commit to donhardman/diffusers that referenced this pull request Dec 29, 2023
)

* fix: unscale fp16 gradient problem

* fix for dreambooth lora sdxl

* make the type-casting conditional.

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <[email protected]>

---------

Co-authored-by: Patrick von Platen <[email protected]>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
)

* fix: unscale fp16 gradient problem

* fix for dreambooth lora sdxl

* make the type-casting conditional.

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <[email protected]>

---------

Co-authored-by: Patrick von Platen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

In training the script train_text_to_image_lora.py on Colab with a V100 GPU, the error ValueError: Attempting to unscale FP16 gradients occurred.

7 participants