-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Hi, I'm in colab pro environment using TPU v2 for the test purpose.
I get this error
flax_model.msgpack or pytorch_model.bin.
2023-02-18 02:07:26.213102: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-02-18 02:07:26.213285: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-02-18 02:07:26.213310: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-02-18 02:07:29.194168: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
WARNING:jax._src.lib.xla_bridge:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
WARNING:datasets.builder:Using custom data configuration lambdalabs--pokemon-blip-captions-10e3527a764857bd
WARNING:datasets.builder:Found cached dataset parquet (/root/.cache/huggingface/datasets/lambdalabs___parquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
100% 1/1 [00:00<00:00, 369.54it/s]
loading file vocab.json from cache at /root/.cache/huggingface/hub/models--CompVis--stable-diffusion-v1-4/snapshots/3857c45b7d4e78b3ba0f39d4d7f50a2a05aa23d4/tokenizer/vocab.json
loading file merges.txt from cache at /root/.cache/huggingface/hub/models--CompVis--stable-diffusion-v1-4/snapshots/3857c45b7d4e78b3ba0f39d4d7f50a2a05aa23d4/tokenizer/merges.txt
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--CompVis--stable-diffusion-v1-4/snapshots/3857c45b7d4e78b3ba0f39d4d7f50a2a05aa23d4/tokenizer/special_tokens_map.json
loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--CompVis--stable-diffusion-v1-4/snapshots/3857c45b7d4e78b3ba0f39d4d7f50a2a05aa23d4/tokenizer/tokenizer_config.json
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--CompVis--stable-diffusion-v1-4/snapshots/3857c45b7d4e78b3ba0f39d4d7f50a2a05aa23d4/text_encoder/config.json
Model config CLIPTextConfig {
"_name_or_path": "openai/clip-vit-large-patch14",
"architectures": [
"CLIPTextModel"
],
"attention_dropout": 0.0,
"bos_token_id": 0,
"dropout": 0.0,
"eos_token_id": 2,
"hidden_act": "quick_gelu",
"hidden_size": 768,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 77,
"model_type": "clip_text_model",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 1,
"projection_dim": 512,
"torch_dtype": "float32",
"transformers_version": "4.26.1",
"vocab_size": 49408
}
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/diffusers/examples/text_to_image/train_text_to_image_flax.py:579 in │
│ <module> │
│ │
│ 576 │
│ 577 │
│ 578 if __name__ == "__main__": │
│ ❱ 579 │ main() │
│ 580 │
│ │
│ /content/diffusers/examples/text_to_image/train_text_to_image_flax.py:390 in │
│ main │
│ │
│ 387 │ │
│ 388 │ # Load models and create wrapper for stable diffusion │
│ 389 │ tokenizer = CLIPTokenizer.from_pretrained(args.pretrained_model_na │
│ ❱ 390 │ text_encoder = FlaxCLIPTextModel.from_pretrained( │
│ 391 │ │ args.pretrained_model_name_or_path, subfolder="text_encoder", │
│ 392 │ ) │
│ 393 │ vae, vae_params = FlaxAutoencoderKL.from_pretrained( │
│ │
│ /usr/local/lib/python3.8/dist-packages/transformers/modeling_flax_utils.py:7 │
│ 64 in from_pretrained │
│ │
│ 761 │ │ │ │ │ │ │ │ " `from_pt=True` to load this model f │
│ 762 │ │ │ │ │ │ │ ) │
│ 763 │ │ │ │ │ │ else: │
│ ❱ 764 │ │ │ │ │ │ │ raise EnvironmentError( │
│ 765 │ │ │ │ │ │ │ │ f"{pretrained_model_name_or_path} doe │
│ 766 │ │ │ │ │ │ │ │ f" {FLAX_WEIGHTS_NAME} or {WEIGHTS_NA │
│ 767 │ │ │ │ │ │ │ ) │
╰──────────────────────────────────────────────────────────────────────────────╯
OSError: CompVis/stable-diffusion-v1-4 does not appear to have a file named
flax_model.msgpack or pytorch_model.bin.
Reproduction
!git clone https://github.com/huggingface/diffusers
%cd diffusers
!pip install .
%cd /content/diffusers/examples/text_to_image
pip install -r requirements_flax.txt
!huggingface-cli login
!accelerate config
MODEL_NAME="CompVis/stable-diffusion-v1-4"
dataset_name="lambdalabs/pokemon-blip-captions"
!python train_text_to_image_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$dataset_name \
--resolution=512 --center_crop --random_flip \
--train_batch_size=1 \
--max_train_steps=15000 \
--learning_rate=1e-05 \
--max_grad_norm=1 \
--output_dir="sd-pokemon-model"
Logs
No response
System Info
- `diffusers` version: 0.14.0.dev0
- Platform: Linux-5.10.147+-x86_64-with-glibc2.29
- Python version: 3.8.10
- PyTorch version (GPU?): 1.13.1+cu116 (False)
- Huggingface_hub version: 0.12.1
- Transformers version: 4.26.1
- Accelerate version: 0.16.0
- xFormers version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
TPU version
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working