Skip to content

train_text_to_image_flax.py no flax_model.msgpack or pytorch_model.bin #2410

@treksis

Description

@treksis

Describe the bug

Hi, I'm in colab pro environment using TPU v2 for the test purpose.

I get this error

flax_model.msgpack or pytorch_model.bin.
2023-02-18 02:07:26.213102: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-02-18 02:07:26.213285: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-02-18 02:07:26.213310: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-02-18 02:07:29.194168: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
WARNING:jax._src.lib.xla_bridge:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
WARNING:datasets.builder:Using custom data configuration lambdalabs--pokemon-blip-captions-10e3527a764857bd
WARNING:datasets.builder:Found cached dataset parquet (/root/.cache/huggingface/datasets/lambdalabs___parquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
100% 1/1 [00:00<00:00, 369.54it/s]
loading file vocab.json from cache at /root/.cache/huggingface/hub/models--CompVis--stable-diffusion-v1-4/snapshots/3857c45b7d4e78b3ba0f39d4d7f50a2a05aa23d4/tokenizer/vocab.json
loading file merges.txt from cache at /root/.cache/huggingface/hub/models--CompVis--stable-diffusion-v1-4/snapshots/3857c45b7d4e78b3ba0f39d4d7f50a2a05aa23d4/tokenizer/merges.txt
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--CompVis--stable-diffusion-v1-4/snapshots/3857c45b7d4e78b3ba0f39d4d7f50a2a05aa23d4/tokenizer/special_tokens_map.json
loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--CompVis--stable-diffusion-v1-4/snapshots/3857c45b7d4e78b3ba0f39d4d7f50a2a05aa23d4/tokenizer/tokenizer_config.json
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--CompVis--stable-diffusion-v1-4/snapshots/3857c45b7d4e78b3ba0f39d4d7f50a2a05aa23d4/text_encoder/config.json
Model config CLIPTextConfig {
  "_name_or_path": "openai/clip-vit-large-patch14",
  "architectures": [
    "CLIPTextModel"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 0,
  "dropout": 0.0,
  "eos_token_id": 2,
  "hidden_act": "quick_gelu",
  "hidden_size": 768,
  "initializer_factor": 1.0,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 77,
  "model_type": "clip_text_model",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "projection_dim": 512,
  "torch_dtype": "float32",
  "transformers_version": "4.26.1",
  "vocab_size": 49408
}

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/diffusers/examples/text_to_image/train_text_to_image_flax.py:579 in │
│ <module>                                                                     │
│                                                                              │
│   576                                                                        │
│   577                                                                        │
│   578 if __name__ == "__main__":                                             │
│ ❱ 579 │   main()                                                             │
│   580                                                                        │
│                                                                              │
│ /content/diffusers/examples/text_to_image/train_text_to_image_flax.py:390 in │
│ main                                                                         │
│                                                                              │
│   387 │                                                                      │
│   388 │   # Load models and create wrapper for stable diffusion              │
│   389 │   tokenizer = CLIPTokenizer.from_pretrained(args.pretrained_model_na │
│ ❱ 390 │   text_encoder = FlaxCLIPTextModel.from_pretrained(                  │
│   391 │   │   args.pretrained_model_name_or_path, subfolder="text_encoder",  │
│   392 │   )                                                                  │
│   393 │   vae, vae_params = FlaxAutoencoderKL.from_pretrained(               │
│                                                                              │
│ /usr/local/lib/python3.8/dist-packages/transformers/modeling_flax_utils.py:7 │
│ 64 in from_pretrained                                                        │
│                                                                              │
│    761 │   │   │   │   │   │   │   │   " `from_pt=True` to load this model f │
│    762 │   │   │   │   │   │   │   )                                         │
│    763 │   │   │   │   │   │   else:                                         │
│ ❱  764 │   │   │   │   │   │   │   raise EnvironmentError(                   │
│    765 │   │   │   │   │   │   │   │   f"{pretrained_model_name_or_path} doe │
│    766 │   │   │   │   │   │   │   │   f" {FLAX_WEIGHTS_NAME} or {WEIGHTS_NA │
│    767 │   │   │   │   │   │   │   )                                         │
╰──────────────────────────────────────────────────────────────────────────────╯
OSError: CompVis/stable-diffusion-v1-4 does not appear to have a file named 
flax_model.msgpack or pytorch_model.bin.

Reproduction

!git clone https://github.com/huggingface/diffusers
%cd diffusers
!pip install .

%cd /content/diffusers/examples/text_to_image
pip install -r requirements_flax.txt

!huggingface-cli login
!accelerate config

MODEL_NAME="CompVis/stable-diffusion-v1-4"
dataset_name="lambdalabs/pokemon-blip-captions"

!python train_text_to_image_flax.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$dataset_name \
  --resolution=512 --center_crop --random_flip \
  --train_batch_size=1 \
  --max_train_steps=15000 \
  --learning_rate=1e-05 \
  --max_grad_norm=1 \
  --output_dir="sd-pokemon-model" 

Logs

No response

System Info

- `diffusers` version: 0.14.0.dev0
- Platform: Linux-5.10.147+-x86_64-with-glibc2.29
- Python version: 3.8.10
- PyTorch version (GPU?): 1.13.1+cu116 (False)
- Huggingface_hub version: 0.12.1
- Transformers version: 4.26.1
- Accelerate version: 0.16.0
- xFormers version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Accelerate ENV
image

TPU version

image

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions