-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Missing fixed_size
value in GeneralizedRCNNTransform breaks Faster-RCNN torchscript loading with C++ in train mode
#4366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@beniz Thanks for reporting. I noticed that your script sets the model in training model before exporting. Is that intentional? Have you tried it without it? Edit: I confirm that the
Frankly I'm not sure exporting it in training mode will work. We could fix the specific issue, but I believe it's likely to fail elsewhere. @fmassa any thoughts on whether we ever intended to support this? |
Thanks @datumbox for the super quick answer, and on a Sunday! So yes, good catch, I should have mentioned it, the training mode is intentional, since we're training from C++. I should mention it's possible to workaround the issue by setting the model.transform = M.detection.faster_rcnn.GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std, fixed_size=(min_size,min_size)) But as you correctly anticipated, the export in training mode fails a bit further down the road. Also, note that our script works fine exporting in training mode when we are specifiying the backbone with: python3 trace_torchvision.py fasterrcnn --backbone resnet50 --num_classes 2 So I was thinking that there may be a generic path to export for training, starting with the My take at this stage:
I've started rewriting a simplified GeneralizedRCNNTransform to accomodate the |
@beniz Because I don't have your setup, it's hard for me to reproduce it. Could you please let me know what happens if you apply the patch at #4369 to your TorchVision code? Is the fixed_size error resolved? Note that unfortunately changing the value to |
Thanks for the patch @datumbox. It does not resolve the No worries about the FYI, when we workaround the
More generally, I guess this boils down to finding a way to handle |
@beniz Could you provide the exact error you get after applying the patch at #4369? My understanding is that the optional and the type definition should make it work. Concerning the None at |
Sure, I did patch a fresh clone with Don't worry for us about the remaining issues like input sizes, we know what we're doing :) FYI, we looked at it this morning and prior versions of torchvision used to export nicely for training, it did break somehow :( Once fixed (whether here or with custom patched code on our side), we'll add export to our CI. |
Thanks for confirming your loading strategy. Could you provide the full error after running your code with the patch? |
Thanks for your attention to this issue @datumbox, the error remains identical:
Let me know if I can help by testing a few more code changes you would suggest, as I have the setup ready and I could report results on a range of changes. |
fixed_size
value in GeneralizedRCNNTransform breaks Faster-RCNN torchscript loading with C++fixed_size
value in GeneralizedRCNNTransform breaks Faster-RCNN torchscript loading with C++ in train mode
@beniz I've temporarily modified a similar test that we have at vision here to export the model on train mode. I then passed data through it and I don't get any errors, see here. Without being able to properly reproduce the error you see, it's hard to provide guidance. Would you be able to send a dummy PR where you modify the above scripts in a way that they get similar to your setup and reproduce the error on our CI (see the linked commit above for example)? If you manage to reproduce it with a minimal example, I can help you investigate further. |
@datumbox Thanks very much. So further tests on our side did reveal that the C++ build was using torch 1.8, while with 1.9 there's no error. |
Uh oh!
There was an error while loading. Please reload this page.
🐛 Describe the bug
fixed_size
value inGeneralizedRCNNTransform
instantiation for faster_rcnn defaults toNone
which breaks torchcript inference in C++.See
vision/torchvision/models/detection/faster_rcnn.py
Line 234 in 7947fc8
vision/torchvision/models/detection/ssd.py
Line 203 in 7947fc8
fixed_size
is explicitely set.Thus with faster_rcnn,
fixed_size
defaults toNone
and loading from C++ yields:To reproduce, we export the model with
torch.jit.script
forfasterrcnn_resnet50_fpn
and we load from C++ withtorch::jit::load()
.Actually the exact export Python code we use is here: https://github.com/jolibrain/deepdetect/blob/master/tools/torch/trace_torchvision.py and we run:
Versions
The text was updated successfully, but these errors were encountered: