torchrun --nproc_per_node=2 --nnodes=1 ./gen_utils/generate.py \
model.name='bert-base-uncased' use_sentence_piece=True batch_size=128 \
exp.name=play2 load_step=10000 data.name=docedit \
tgt_len=90 max_pos_len=512 \
num_samples=1 intermediate_size=2048 num_attention_heads=8 dropout=0.2 \
in_channels=128 out_channels=128 time_channels=128 skip_sample=True gen_timesteps=1000 \
schedule_sampler='xy_uniform' time_att=False att_strategy='txl' load_from_ema=False prediction=True
anyway the for loop loads dev_dataloader which is just the same data. Then what is the point of num_sampes parameter