Skip to content

not able to use push_to_hub during tpu training  #2851

@yiyixuxu

Description

@yiyixuxu

Describe the bug

Not able to use ---push_to_hub option for TPU training

getting error

Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:15:59 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].

This is not a unique train_text_to_image_flax.py script. I'm just using it as an example. Basically, this line will always fail when called during training on a tpu https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_flax.py#L584

Reproduction

run the train_text_to_image_flax script here with this command

https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-flaxjax

export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export dataset_name="lambdalabs/pokemon-blip-captions"
export OUTPUT_DIR="/pokemon"
export HUB_MODEL_ID="pokemon-lora"

python3 train_text_to_image_flax.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$dataset_name \
  --resolution=512 --center_crop --random_flip \
  --train_batch_size=1 \
  --mixed_precision="fp16" \
  --max_train_steps=150 \
  --learning_rate=1e-05 \
  --max_grad_norm=1 \
  --output_dir="sd-pokemon-model" \
  --push_to_hub \
  --hub_model_id=${HUB_MODEL_ID} 

Logs

Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:13:38 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:13:48 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:13:58 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:14:08 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:14:18 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:14:28 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:14:38 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:14:48 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:14:58 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:15:09 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:15:19 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:15:29 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:15:39 - ERROR - huggingface_hub.repository - Waiting for the following commands to 

finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:15:49 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].
03/27/2023 23:15:59 - ERROR - huggingface_hub.repository - Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 772274]].

System Info

tpu-v4-8

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions