Skip to content

ControlNet Training failed on validation using the default Tensorboard report_to option #2695

@takuma104

Description

@takuma104

Describe the bug

I tried the training of the ControlNet in the main branch right away. The default option for --report_to is set to tensorboard, it seems to raise a ValueError and stop the process after generating validation images. As a workaround, using wandb did not cause this issue.

Reproduction

Using this script (16GB sample from README.md, I added a mandatory tracker_project_name option):

#!/bin/bash

export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="./checkpoints"

accelerate launch ../examples/controlnet/train_controlnet.py \
 --pretrained_model_name_or_path=$MODEL_DIR \
 --output_dir=$OUTPUT_DIR \
 --dataset_name=fusing/fill50k \
 --resolution=512 \
 --learning_rate=1e-5 \
 --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
 --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
 --train_batch_size=1 \
 --gradient_accumulation_steps=4 \
 --gradient_checkpointing \
 --use_8bit_adam \
 --tracker_project_name fill50k

Adding the --report_to wandb option should prevent the issue.

Logs

{'dynamic_thresholding_ratio', 'lower_order_final', 'predict_x0', 'solver_order', 'sample_max_value', 'solver_p', 'solver_type', 'disable_corrector', 'thresholding'} was not found in config. Values will be initialized to default values.
/home/takuma/miniconda3/envs/torch1.13.1/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
  File "/home/takuma/Documents/co/diffusers/train/../examples/controlnet/train_controlnet.py", line 1063, in <module>
    main(args)
  File "/home/takuma/Documents/co/diffusers/train/../examples/controlnet/train_controlnet.py", line 1030, in main
    log_validation(
  File "/home/takuma/Documents/co/diffusers/train/../examples/controlnet/train_controlnet.py", line 139, in log_validation
    formatted_images = np.stack(formatted_images)
  File "<__array_function__ internals>", line 180, in stack
  File "/home/takuma/miniconda3/envs/torch1.13.1/lib/python3.10/site-packages/numpy/core/shape_base.py", line 426, in stack
    raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape


### System Info

- `diffusers` version: 0.15.0.dev0  16ea3b5379c1e78a4bc8e3fc9cae8d65c42511b1 
- Platform: Linux-5.19.0-32-generic-x86_64-with-glibc2.35
- Python version: 3.10.9
- PyTorch version (GPU?): 1.13.1 (True)
- Huggingface_hub version: 0.12.1
- Transformers version: 4.26.1
- Accelerate version: 0.17.0.dev0
- xFormers version: 0.0.17.dev473
- Using GPU in script?: Yes. RTX3090
- Using distributed or parallel set-up in script?: N

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingstaleIssues that haven't received updates

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions