-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Description
🐛 Bug
The FasterRCNN
model (and, more generally, the GeneralizedRCNN
class) expects as input images a list of float PyTorch tensors, but if you try to pass it a list of tensors with dtype torch.uint8
, the model returns NaN
values in the normalization step and, as a consequence, in the losses computation.
To Reproduce
Steps to reproduce the behavior:
- Load an image as a PyTorch tensor with dtype
torch.uint8
, along with its corresponding target dictionary - Create an instance of
FasterRCNN
and pass that image to the model - Observe the output of the model, which should be the dictionary of losses with all
NaN
values
Expected behavior
I would have expected the model to throw an exception or at least a warning. In particular, since the GeneralizedRCNN
class takes care of transformations such as normalization and resizing, in my opinion it should also check the type of the input images, in order to avoid such errors.
Environment
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 10.15.7 (x86_64)
GCC version: Could not collect
Clang version: 12.0.0 (clang-1200.0.32.28)
CMake version: version 3.18.4
Python version: 3.8 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.4
[pip3] torch==1.7.1
[pip3] torchvision==0.8.2
[conda] Could not collect
Additional context
I realized that the error I was facing is caused by the normalize
function of the GeneralizedRCNNTransform
class, which relies on the image dtype to convert the mean and standard deviation lists to tensors, so that in the default case (ImageNet mean/std) they contain all zeros.
def normalize(self, image):
dtype, device = image.dtype, image.device
mean = torch.as_tensor(self.image_mean, dtype=dtype, device=device)
std = torch.as_tensor(self.image_std, dtype=dtype, device=device)
return (image - mean[:, None, None]) / std[:, None, None]
To avoid this problem, a simple image.float()
would suffice.