diff --git a/examples/seq2seq/README.md b/examples/seq2seq/README.md index c1d599983f05..d025d46c9734 100644 --- a/examples/seq2seq/README.md +++ b/examples/seq2seq/README.md @@ -213,6 +213,11 @@ To see all the possible command line options, run: python finetune_trainer.py --help ``` +For multi-gpu training use `torch.distributed.launch`, e.g. with 2 gpus: +```bash +python -m torch.distributed.launch --nproc_per_node=2 finetune_trainer.py ... +``` + **At the moment, `Seq2SeqTrainer` does not support *with teacher* distillation.** All `Seq2SeqTrainer`-based fine-tuning scripts are included in the `builtin_trainer` directory.