Skip to content

Commit 6a1d9ee

Browse files
authored
Update readme.md with ViT training command (#5086)
As titled.
1 parent c34a914 commit 6a1d9ee

File tree

1 file changed

+24
-2
lines changed

1 file changed

+24
-2
lines changed

references/classification/README.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ torchrun --nproc_per_node=8 train.py\
125125
```
126126
Here `$MODEL` is one of `regnet_x_400mf`, `regnet_x_800mf`, `regnet_x_1_6gf`, `regnet_y_400mf`, `regnet_y_800mf` and `regnet_y_1_6gf`. Please note we used learning rate 0.4 for `regent_y_400mf` to get the same Acc@1 as [the paper)(https://arxiv.org/abs/2003.13678).
127127

128-
### Medium models
128+
#### Medium models
129129
```
130130
torchrun --nproc_per_node=8 train.py\
131131
--model $MODEL --epochs 100 --batch-size 64 --wd 0.00005 --lr=0.4\
@@ -134,7 +134,7 @@ torchrun --nproc_per_node=8 train.py\
134134
```
135135
Here `$MODEL` is one of `regnet_x_3_2gf`, `regnet_x_8gf`, `regnet_x_16gf`, `regnet_y_3_2gf` and `regnet_y_8gf`.
136136

137-
### Large models
137+
#### Large models
138138
```
139139
torchrun --nproc_per_node=8 train.py\
140140
--model $MODEL --epochs 100 --batch-size 32 --wd 0.00005 --lr=0.2\
@@ -143,6 +143,28 @@ torchrun --nproc_per_node=8 train.py\
143143
```
144144
Here `$MODEL` is one of `regnet_x_32gf`, `regnet_y_16gf` and `regnet_y_32gf`.
145145

146+
### Vision Transformer
147+
148+
#### Base models
149+
```
150+
torchrun --nproc_per_node=8 train.py\
151+
--model $MODEL --epochs 300 --batch-size 64 --opt adamw --lr 0.003 --wd 0.3\
152+
--lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
153+
--lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
154+
--clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema
155+
```
156+
Here `$MODEL` is one of `vit_b_16` and `vit_b_32`.
157+
158+
#### Large models
159+
```
160+
torchrun --nproc_per_node=8 train.py\
161+
--model $MODEL --epochs 300 --batch-size 16 --opt adamw --lr 0.003 --wd 0.3\
162+
--lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
163+
--lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
164+
--clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema
165+
```
166+
Here `$MODEL` is one of `vit_l_16` and `vit_l_32`.
167+
146168
## Mixed precision training
147169
Automatic Mixed Precision (AMP) training on GPU for Pytorch can be enabled with the [torch.cuda.amp](https://pytorch.org/docs/stable/amp.html?highlight=amp#module-torch.cuda.amp).
148170

0 commit comments

Comments
 (0)