Skip to content

GeneralizedRCNN returns NaNs with torch.uint8 inputs #3228

@Wadaboa

Description

@Wadaboa

🐛 Bug

The FasterRCNN model (and, more generally, the GeneralizedRCNN class) expects as input images a list of float PyTorch tensors, but if you try to pass it a list of tensors with dtype torch.uint8, the model returns NaN values in the normalization step and, as a consequence, in the losses computation.

To Reproduce

Steps to reproduce the behavior:

  1. Load an image as a PyTorch tensor with dtype torch.uint8, along with its corresponding target dictionary
  2. Create an instance of FasterRCNN and pass that image to the model
  3. Observe the output of the model, which should be the dictionary of losses with all NaN values

Expected behavior

I would have expected the model to throw an exception or at least a warning. In particular, since the GeneralizedRCNN class takes care of transformations such as normalization and resizing, in my opinion it should also check the type of the input images, in order to avoid such errors.

Environment

PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 10.15.7 (x86_64)
GCC version: Could not collect
Clang version: 12.0.0 (clang-1200.0.32.28)
CMake version: version 3.18.4

Python version: 3.8 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.4
[pip3] torch==1.7.1
[pip3] torchvision==0.8.2
[conda] Could not collect

Additional context

I realized that the error I was facing is caused by the normalize function of the GeneralizedRCNNTransform class, which relies on the image dtype to convert the mean and standard deviation lists to tensors, so that in the default case (ImageNet mean/std) they contain all zeros.

def normalize(self, image):
        dtype, device = image.dtype, image.device
        mean = torch.as_tensor(self.image_mean, dtype=dtype, device=device)
        std = torch.as_tensor(self.image_std, dtype=dtype, device=device)
        return (image - mean[:, None, None]) / std[:, None, None]  

To avoid this problem, a simple image.float() would suffice.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions