Skip to content

ValueError while using --optimize_on_cpu #23

@rsanjaykamath

Description

@rsanjaykamath

Traceback (most recent call last): | 1/87970 [00:00<8:35:35, 2.84it/s]
File "./run_squad.py", line 990, in
main()
File "./run_squad.py", line 922, in main
is_nan = set_optimizer_params_grad(param_optimizer, model.named_parameters(), test_nan=True)
File "./run_squad.py", line 691, in set_optimizer_params_grad
if test_nan and torch.isnan(param_model.grad).sum() > 0:
File "/people/sanjay/anaconda2/envs/bert_pytorch/lib/python3.5/site-packages/torch/functional.py", line 289, in isnan
raise ValueError("The argument is not a tensor", str(tensor))
ValueError: ('The argument is not a tensor', 'None')

Command:
CUDA_VISIBLE_DEVICES=0 python ./run_squad.py
--vocab_file bert_large/uncased_L-24_H-1024_A-16/vocab.txt
--bert_config_file bert_large/uncased_L-24_H-1024_A-16/bert_config.json
--init_checkpoint bert_large/uncased_L-24_H-1024_A-16/pytorch_model.bin
--do_lower_case
--do_train
--do_predict
--train_file squad_dir/train-v1.1.json
--predict_file squad_dir/dev-v1.1.json
--learning_rate 3e-5
--num_train_epochs 2
--max_seq_length 384
--doc_stride 128
--output_dir outputs
--train_batch_size 4
--gradient_accumulation_steps 2
--optimize_on_cpu

Error while using --optimize_on_cpu only.
Works fine without the argument.

GPU: Nvidia GTX 1080Ti Single GPU.

PS: I can only fit in train_batch_size 4 on the memory of a single GPU.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions