[Flux] Dreambooth LoRA training scripts #9086

linoytsaban · 2024-08-05T11:03:45Z

add dreambooth & dreambooth lora training scripts for FLUX [dev]

HuggingFaceDocBuilderDev · 2024-08-05T11:10:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…-tuning

sayakpaul

Thank you for starting this @linoytsaban!

I think this could stay as a minimal educational resource for the community. Could we see some good first results as well? We should probably mention from the readme that for a more advanced and memory-friendly setup, please refer to @bghira's guide here: #9057 (comment).

LMK what you think.

examples/dreambooth/train_dreambooth_lora_flux.py

sayakpaul · 2024-08-06T02:30:21Z

examples/dreambooth/train_dreambooth_lora_flux.py

+            "Mixed precision training with bfloat16 is not supported on MPS. Please use fp16 (recommended) or fp32 instead."
+        )
+
+    vae.to(accelerator.device, dtype=torch.float32)


The VAE seems to be better. So, let's first try with weight_dtype?

bf16 is good yes but fp16 is not, there are some huge activation values in the attn layer. but you can also just disable attn or upcast it. same issue roughly as SD 2.x Unet, but not quite

maybe fp16 is okay after #9097

examples/dreambooth/train_dreambooth_lora_flux.py

sayakpaul · 2024-08-06T02:39:49Z

examples/dreambooth/train_dreambooth_lora_flux.py

+                u = compute_density_for_timestep_sampling(
+                    weighting_scheme=args.weighting_scheme,
+                    batch_size=bsz,
+                    logit_mean=args.logit_mean,
+                    logit_std=args.logit_std,
+                    mode_scale=args.mode_scale,
+                )


cc: @bghira. Should we default to "log_normal" weighting here as well?

i used 'none' at first which is an option i added to mine that just uses uniform sampling and uniform loss

but adding 'cosmap' as an option seems to be ideal, maybe approximates some min-snr-gamma like behaviour

the loss seems to follow a bathtub curve in some tests where it's high at both ends and low in the middle. maybe that's why the cosmap

We support:

diffusers/examples/dreambooth/train_dreambooth_lora_sd3.py

Line 512 in 87e50a2

choices=["sigma_sqrt", "logit_normal", "mode", "cosmap"],

In your experiments, wha's been the best scheme so far?

cosmap or none

Do you have a reference for none?

yes just adding "none" to the above choices list allows it to use uniform sampling and no weighting without any add'l changes

Ah thanks. Good to know.

here's how the different selection options look. i think uniform makes the most sense for low batch sizes, eg none

examples/dreambooth/train_dreambooth_lora_flux.py

examples/dreambooth/train_dreambooth_flux.py

bghira · 2024-08-06T04:51:47Z

examples/dreambooth/train_dreambooth_flux.py

+
+                # handle guidance
+                if transformer.config.guidance_embeds:
+                    guidance = torch.tensor([args.guidance_scale], device=device)


i think an option for min and max values and random range it might be more effective after experimentation

…-tuning

linoytsaban · 2024-08-08T13:30:46Z

examples/dreambooth/train_dreambooth_lora_flux.py

+    if accelerator.is_main_process:
+        transformer = unwrap_model(transformer)
+        transformer = transformer.to(torch.float32)
+        transformer_lora_layers = get_peft_model_state_dict(transformer)


@sayakpaul @bghira
I wonder if upcasting here is necessary or we can support an option to save in lower precision. For me, even when training a rank 4 with 8 bit Adam it still crashes here (training on A100). I think @bghira doesnt upcast when saving - what's your impression?

i am using quanto so i don't touch the dtype before export - using bf16 exports made the files half the size, so i kept it for some time now

alternatively you can move it to cpu but then you're consuming system memory and users report SIGKILL when it OOMs and Linux gets angry

…ue in both

…-tuning

bghira

great work, cant wait to see what people do with it

sayakpaul · 2024-08-09T02:01:19Z

Thank you!

sevenclay · 2024-08-09T08:08:53Z

requirements_flux.txt not found

rvorias · 2024-08-09T21:03:49Z

how much vram does the default script use? I'm ooming an a100 80gb with fp16

dannypostma · 2024-08-10T03:28:19Z

Amazing work! FYI, A100 80GB out of memory on Replicate.com

erwold · 2024-08-10T05:48:39Z

this seems not correct. from the SimpleTuner code below:

I'm not sure if I get it, if your target is "noise - model_input", then you shouldn't have "model_pred = model_pred * (-sigmas) + noisy_latents" this line of code.

bghira · 2024-08-10T09:26:29Z

ah, erwold is right:

        # noise
        z1 = torch.randn_like(x)
        # noisy_latents = ( 1.0 - sigmas  ) * latents.float() + sigmas * noise.float()
        zt = (1 - texp) * x + texp * z1
        # model_pred
        vtheta = self.model(zt, t, cond)
        # target = z1 - x
        # mse = target - model_pred
        batchwise_mse = ((z1 - x - vtheta) ** 2).mean(dim=list(range(1, len(x.shape))))

linoytsaban · 2024-08-10T13:27:20Z

You're totally right thanks! it was left from the pre-conditioning option that we decided eventually to remove

linoytsaban · 2024-08-10T14:26:54Z

all fixes are here - #9139

* initial commit - dreambooth for flux * update transformer to be FluxTransformer2DModel * update training loop and validation inference * fix sd3->flux docs * add guidance handling, not sure if it makes sense(?) * inital dreambooth lora commit * fix text_ids in compute_text_embeddings * fix imports of static methods * fix pipeline loading in readme, remove auto1111 docs for now * fix pipeline loading in readme, remove auto1111 docs for now, remove some irrelevant text_encoder_3 refs * Update examples/dreambooth/train_dreambooth_flux.py Co-authored-by: Bagheera <[email protected]> * fix te2 loading and remove te2 refs from text encoder training * fix tokenizer_2 initialization * remove text_encoder training refs from lora script (for now) * try with vae in bfloat16, fix model hook save * fix tokenization * fix static imports * fix CLIP import * remove text_encoder training refs (for now) from lora script * fix minor bug in encode_prompt, add guidance def in lora script, ... * fix unpack_latents args * fix license in readme * add "none" to weighting_scheme options for uniform sampling * style * adapt model saving - remove text encoder refs * adapt model loading - remove text encoder refs * initial commit for readme * Update examples/dreambooth/train_dreambooth_lora_flux.py Co-authored-by: Sayak Paul <[email protected]> * Update examples/dreambooth/train_dreambooth_lora_flux.py Co-authored-by: Sayak Paul <[email protected]> * fix vae casting * remove precondition_outputs * readme * readme * style * readme * readme * update weighting scheme default & docs * style * add text_encoder training to lora script, change vae_scale_factor value in both * style * text encoder training fixes * style * update readme * minor fixes * fix te params * fix te params --------- Co-authored-by: Bagheera <[email protected]> Co-authored-by: Sayak Paul <[email protected]>

linoytsaban and others added 5 commits August 3, 2024 18:21

initial commit - dreambooth for flux

fb5488d

update transformer to be FluxTransformer2DModel

3062daf

update training loop and validation inference

1448aa1

fix sd3->flux docs

a59e012

Merge branch 'huggingface:main' into flux-fine-tuning

b225283

linoytsaban and others added 5 commits August 5, 2024 14:48

add guidance handling, not sure if it makes sense(?)

df074d4

Merge remote-tracking branch 'origin/flux-fine-tuning' into flux-fine…

4f7c88c

…-tuning

inital dreambooth lora commit

c897c31

fix text_ids in compute_text_embeddings

c3c38a4

Merge branch 'main' into flux-fine-tuning

5e92ba9

linoytsaban marked this pull request as ready for review August 5, 2024 12:58

linoytsaban requested a review from sayakpaul August 5, 2024 13:05

linoytsaban changed the title ~~[Flux] Dreambooth training script~~ [Flux] Dreambooth LoRA training scripts Aug 5, 2024

sayakpaul reviewed Aug 6, 2024

View reviewed changes