[advanced dreambooth lora sdxl script]: cannot train --with_prior_preservation, shape mismatch

### Describe the bug

I came across this while testing new features from https://github.com/huggingface/diffusers/pull/6691 (many thanks for supporting micro-conditioning!)

Using [`train_dreambooth_lora_sdxl_advanced.py`](https://github.com/huggingface/diffusers/blob/main/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py) `--with_prior_preservation` results in an invalid shape for prediction with the `unet_added_conditions['time_ids']` tensor.

It may be related to the way the `class_time_ids` are computed.

### Reproduction

Follow instructions from [advanced_diffusion_training README](https://github.com/huggingface/diffusers/tree/main/examples/advanced_diffusion_training):

* Install from source
* Download dataset for testing:
```python
from huggingface_hub import snapshot_download

local_dir = "./3d_icon"
snapshot_download(
    "LinoyTsaban/3d_icon",
    local_dir=local_dir, repo_type="dataset",
    ignore_patterns=".gitattributes",
)
```

Execute training with prior preservation (see last arguments):

```bash
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export DATASET_NAME="./3d_icon"
export OUTPUT_DIR="3d-icon-SDXL-LoRA"
export CLASS_DATA_DIR="./class_data_dir/icons"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"

accelerate launch train_dreambooth_lora_sdxl_advanced.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --pretrained_vae_model_name_or_path=$VAE_PATH \
  --dataset_name=$DATASET_NAME \
  --instance_prompt="3d icon in the style of ohwx" \
  --validation_prompt="a ohwx icon of an astronaut riding a horse, in the style of ohwx" \
  --output_dir=$OUTPUT_DIR \
  --caption_column="prompt" \
  --mixed_precision="bf16" \
  --resolution=1024 \
  --train_batch_size=1 \
  --repeats=1 \
  --gradient_accumulation_steps=1 \
  --gradient_checkpointing \
  --learning_rate=1.0 \
  --text_encoder_lr=1.0 \
  --optimizer="prodigy"\
  --train_text_encoder \
  --train_text_encoder_frac=0.5 \
  --snr_gamma=5.0 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --rank=8 \
  --max_train_steps=1000 \
  --checkpointing_steps=2000 \
  --seed="0" \
  --with_prior_preservation \
  --class_prompt="icon" \
  --class_data_dir=$CLASS_DATA_DIR \
  --num_class_images=5
```

### Logs

```shell
02/13/2024 16:04:39 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: bf16

{'image_encoder', 'feature_extractor'} was not found in config. Values will be initialized to default values.
Loading pipeline components...:   0%|                                                                            | 0/7 [00:00<?, ?it/s]{'rescale_betas_zero_snr', 'sigma_max', 'timestep_type', 'sigma_min'} was not found in config. Values will be initialized to default values.
Loaded scheduler as EulerDiscreteScheduler from `scheduler` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
{'reverse_transformer_layers_per_block', 'attention_type', 'dropout'} was not found in config. Values will be initialized to default values.
Loaded unet as UNet2DConditionModel from `unet` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...:  29%|███████████████████▍                                                | 2/7 [00:04<00:11,  2.29s/it]Loaded text_encoder_2 as CLIPTextModelWithProjection from `text_encoder_2` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...:  43%|█████████████████████████████▏                                      | 3/7 [00:05<00:07,  1.89s/it]Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...:  57%|██████████████████████████████████████▊                             | 4/7 [00:06<00:03,  1.22s/it]Loaded tokenizer_2 as CLIPTokenizer from `tokenizer_2` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loaded vae as AutoencoderKL from `vae` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...:  86%|██████████████████████████████████████████████████████████▎         | 6/7 [00:06<00:00,  1.53it/s]Loaded text_encoder as CLIPTextModel from `text_encoder` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████| 7/7 [00:06<00:00,  1.06it/s]
02/13/2024 16:04:47 - INFO - __main__ - Number of class images to sample: 5.
Generating class images: 100%|███████████████████████████████████████████████████████████████████████████| 2/2 [00:27<00:00, 13.99s/it]
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'clip_sample_range', 'variance_type', 'rescale_betas_zero_snr', 'thresholding', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
{'reverse_transformer_layers_per_block', 'attention_type', 'dropout'} was not found in config. Values will be initialized to default values.
/home/thomas/code/temp/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py:1534: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(
02/13/2024 16:05:30 - WARNING - __main__ - Learning rates were provided both for the unet and the text encoder- e.g. text_encoder_lr: 1.0 and learning_rate: 1.0. When using prodigy only learning_rate is used as the initial learning rate.
Using decoupled weight decay
02/13/2024 16:05:30 - INFO - datasets - PyTorch version 2.2.0 available.
Resolving data files: 100%|█████████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 50773.15it/s]
Generating train split: 22 examples [00:00, 2150.22 examples/s]
/home/thomas/code/temp/venv/lib/python3.10/site-packages/PIL/Image.py:3186: DecompressionBombWarning: Image size (122880000 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.
  warnings.warn(
02/13/2024 16:05:40 - INFO - __main__ - ***** Running training *****
02/13/2024 16:05:40 - INFO - __main__ -   Num examples = 22
02/13/2024 16:05:40 - INFO - __main__ -   Num batches each epoch = 22
02/13/2024 16:05:40 - INFO - __main__ -   Num Epochs = 46
02/13/2024 16:05:40 - INFO - __main__ -   Instantaneous batch size per device = 1
02/13/2024 16:05:40 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
02/13/2024 16:05:40 - INFO - __main__ -   Gradient Accumulation steps = 1
02/13/2024 16:05:40 - INFO - __main__ -   Total optimization steps = 1000
Steps:   0%|                                                                                                  | 0/1000 [00:00<?, ?it/s]/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
Traceback (most recent call last):
  File "/home/thomas/code/temp/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py", line 2196, in <module>
    main(args)
  File "/home/thomas/code/temp/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py", line 1872, in main
    model_pred = unet(
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 817, in forward
    return model_forward(*args, **kwargs)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 805, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/thomas/code/temp/diffusers/src/diffusers/models/unets/unet_2d_condition.py", line 1027, in forward
    aug_emb = self.add_embedding(add_embeds)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/thomas/code/temp/diffusers/src/diffusers/models/embeddings.py", line 228, in forward
    sample = self.linear_1(sample)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x2048 and 2816x1280)
Steps:   0%|                                                                                                  | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/thomas/code/temp/venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/thomas/code/temp/venv/bin/python', 'train_dreambooth_lora_sdxl_advanced.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix', '--dataset_name=./3d_icon', '--instance_prompt=3d icon in the style of ohwx', '--validation_prompt=a ohwx icon of an astronaut riding a horse, in the style of ohwx', '--output_dir=3d-icon-SDXL-LoRA', '--caption_column=prompt', '--mixed_precision=bf16', '--resolution=1024', '--train_batch_size=1', '--repeats=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--learning_rate=1.0', '--text_encoder_lr=1.0', '--optimizer=prodigy', '--train_text_encoder', '--train_text_encoder_frac=0.5', '--snr_gamma=5.0', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--rank=8', '--max_train_steps=1000', '--checkpointing_steps=2000', '--seed=0', '--with_prior_preservation', '--class_prompt=icon', '--class_data_dir=./class_data_dir/icons', '--num_class_images=5']' returned non-zero exit status 1.
```


### System Info

* Installed diffusers from source with advanced dreambooth lora sdxl script requirements.
* Python 3.10.12

### Who can help?

@linoytsaban It may have been introduced with your last PR? (Thanks again!) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[advanced dreambooth lora sdxl script]: cannot train --with_prior_preservation, shape mismatch #6967

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[advanced dreambooth lora sdxl script]: cannot train --with_prior_preservation, shape mismatch #6967

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions