Skip to content

Commit 5521e9d

Browse files
ain-sophdatumboxjdsgomes
authored
Add SwinV2 (#6246)
* init submit * fix typo * support ufmt and mypy * fix 2 unittest errors * fix ufmt issue * Apply suggestions from code review Co-authored-by: Vasilis Vryniotis <[email protected]> * unify codes * fix meshgrid indexing * fix a bug * fix type check * add type_annotation * add slow model * fix device issue * fix ufmt issue * add expect pickle file * fix jit script issue * fix type check * keep consistent argument order * add support for pretrained_window_size * avoid code duplication * a better code reuse * update window_size argument * make permute and flatten operations modular * add PatchMergingV2 * modify expect.pkl * use None as default argument value * fix type check * fix indent * fix window_size (temporarily) * remove "v2_" related prefix and add v2 builder * remove v2 builder * keep default value consistent with official repo * deprecate dropout * deprecate pretrained_window_size * fix dynamic padding edge case * remove unused imports * remove doc modification * Revert "deprecate dropout" This reverts commit 8a13f93. * Revert "fix dynamic padding edge case" This reverts commit 1c7579c. * remove unused kwargs * add downsample docs * revert block default value * revert argument order change * explicitly specify start_dim * add small and base variants * add expect files and slow_models * Add model weights and documentation for swin v2 * fix lint * fix end of files line Co-authored-by: Vasilis Vryniotis <[email protected]> Co-authored-by: Joao Gomes <[email protected]>
1 parent 7e8186e commit 5521e9d

File tree

7 files changed

+427
-25
lines changed

7 files changed

+427
-25
lines changed

docs/source/models/swin_transformer.rst

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,18 @@ SwinTransformer
33

44
.. currentmodule:: torchvision.models
55

6-
The SwinTransformer model is based on the `Swin Transformer: Hierarchical Vision
6+
The SwinTransformer models are based on the `Swin Transformer: Hierarchical Vision
77
Transformer using Shifted Windows <https://arxiv.org/abs/2103.14030>`__
88
paper.
9+
SwinTransformer V2 models are based on the `Swin Transformer V2: Scaling Up Capacity
10+
and Resolution <https://openaccess.thecvf.com/content/CVPR2022/papers/Liu_Swin_Transformer_V2_Scaling_Up_Capacity_and_Resolution_CVPR_2022_paper.pdf>`__
11+
paper.
912

1013

1114
Model builders
1215
--------------
1316

14-
The following model builders can be used to instantiate an SwinTransformer model.
15-
`swin_t` can be instantiated with pre-trained weights and all others without.
17+
The following model builders can be used to instantiate an SwinTransformer model (original and V2) with and without pre-trained weights.
1618
All the model builders internally rely on the ``torchvision.models.swin_transformer.SwinTransformer``
1719
base class. Please refer to the `source code
1820
<https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py>`_ for
@@ -25,3 +27,6 @@ more details about this class.
2527
swin_t
2628
swin_s
2729
swin_b
30+
swin_v2_t
31+
swin_v2_s
32+
swin_v2_b

references/classification/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,17 @@ Note that `--val-resize-size` was optimized in a post-training step, see their `
236236

237237

238238

239+
### SwinTransformer V2
240+
```
241+
torchrun --nproc_per_node=8 train.py\
242+
--model $MODEL --epochs 300 --batch-size 128 --opt adamw --lr 0.001 --weight-decay 0.05 --norm-weight-decay 0.0 --bias-weight-decay 0.0 --transformer-embedding-decay 0.0 --lr-scheduler cosineannealinglr --lr-min 0.00001 --lr-warmup-method linear --lr-warmup-epochs 20 --lr-warmup-decay 0.01 --amp --label-smoothing 0.1 --mixup-alpha 0.8 --clip-grad-norm 5.0 --cutmix-alpha 1.0 --random-erase 0.25 --interpolation bicubic --auto-augment ta_wide --model-ema --ra-sampler --ra-reps 4 --val-resize-size 256 --val-crop-size 256 --train-crop-size 256
243+
```
244+
Here `$MODEL` is one of `swin_v2_t`, `swin_v2_s` or `swin_v2_b`.
245+
Note that `--val-resize-size` was optimized in a post-training step, see their `Weights` entry for the exact value.
246+
247+
248+
249+
239250
### ShuffleNet V2
240251
```
241252
torchrun --nproc_per_node=8 train.py \
Binary file not shown.
Binary file not shown.
Binary file not shown.

test/test_models.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -332,6 +332,9 @@ def _check_input_backprop(model, inputs):
332332
"swin_t",
333333
"swin_s",
334334
"swin_b",
335+
"swin_v2_t",
336+
"swin_v2_s",
337+
"swin_v2_b",
335338
]
336339
for m in slow_models:
337340
_model_params[m] = {"input_shape": (1, 3, 64, 64)}

0 commit comments

Comments
 (0)