-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Use frozen BN only if pre-trained backbone #5443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit d222c46 (more details on the Dr. CI page):
1 failure not recognized by patterns:
This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
trainable_backbone_layers = _validate_trainable_layers( | ||
pretrained or pretrained_backbone, trainable_backbone_layers, 5, 3 | ||
) | ||
is_trained = pretrained or pretrained_backbone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if the model was trained for detection with large batch sizes from scratch, and then we finetune it afterwards (still with large batch sizes) then in this case we would be using FrozenBatchNorm
.
This is an ok heuristic, but hints that we might want to make this an explicit parameter from the constructor in the future
Reviewed By: vmoens Differential Revision: D34878996 fbshipit-source-id: 690b04fe0810cbd45ed582067b79f7e4254c054e
Currently, the majority of our Detection models replace the
BatchNorm2d
layers withFrozenBatchNorm2d
. This is a reasonable mitigation that improves the stability of training for small batch-sizes. Unfortunately, our current implementation freezes the BNs even when they are completely randomly initialized. SinceFrozenBatchNorm2d
freezes both the running stats and the affine parameters, its parameters get initialized and fixed to values to:vision/torchvision/ops/misc.py
Lines 30 to 33 in 0c2373d
Consequently, the BN layers are effectively completely disabled for those who try to train the models from scratch.
This PR fixes the issue by replacing the BNs with FrozenBNs when at least some pre-trained weights are loaded.