RandomGrayscale fails for grayscale tensor inputs #5581

pmeier · 2022-03-10T09:06:31Z

#1505 added rgb_to_grayscale to tensor images and #2586 aligned it with the PIL op. It was missed that the PIL op is a no-op if the input is already grayscale, whereas the tensor op fails:

>>> image_tensor = torch.rand(1, 16, 16)
>>> F.rgb_to_grayscale(image_tensor)
TypeError: Input image tensor permitted channel values are [3], but found 1
>>> image_pil = F.to_pil_image(image_tensor)
>>> F.rgb_to_grayscale(image_pil)
<PIL.Image.Image image mode=L size=16x16 at 0x7F8181384A10>

As the name as well as the docstring

vision/torchvision/transforms/functional.py

Lines 1231 to 1232 in d0dede0

    
               If the image is torch Tensor, it is expected 
        
               to have [..., 3, H, W] shape, where ... means an arbitrary number of leading dimensions

imply, this transform should only handle RGB inputs so that shouldn't be an issue.

The problem is that the RandomGrayscale transform relies on the no-op behavior:

vision/torchvision/transforms/transforms.py

Lines 1613 to 1616 in d0dede0

    
           num_output_channels, _, _ = F.get_dimensions(img) 
        
           if torch.rand(1) < self.p: 
        
               return F.rgb_to_grayscale(img, num_output_channels=num_output_channels) 
        
           return img

>>> transform = transforms.RandomGrayscale(p=1.0)
>>> transform(image_tensor)
TypeError: Input image tensor permitted channel values are [3], but found 1
>>> transform(image_pil)
<PIL.Image.Image image mode=L size=16x16 at 0x7F23E5B4E850>

Note the docstring is conflicting here:

vision/torchvision/transforms/transforms.py

Lines 1586 to 1587 in d0dede0

    
               If the image is torch Tensor, it is expected 
        
               to have [..., 3, H, W] shape, where ... means an arbitrary number of leading dimensions

vision/torchvision/transforms/transforms.py

Lines 1595 to 1596 in d0dede0

    
                   - If input image is 1 channel: grayscale version is 1 channel 
        
                   - If input image is 3 channel: grayscale version is 3 channel with r == g == b

BC compatible fix would be to patch RandomGrayscale to be an explicit no-op for grayscale inputs.

cc @vfdev-5 @datumbox

The text was updated successfully, but these errors were encountered:

datumbox · 2022-03-10T09:11:51Z

@pmeier I agree that's a problem. I think we should update rgb_to_grayscale for tensors to be no-op if a grayscale image is provided. Are we going to copy or not?

pmeier · 2022-03-10T09:15:34Z

Are we going to copy or not?

I would, yes. Given that the op currently won't return the input tensor, users might rely on this. Given that no one raised an issue (that I know of) about this before, I'm guessing this an edge case anyway. Thus, the potential extra copy shouldn't be a problem here.

vfdev-5 · 2022-03-10T09:20:22Z

PIL does a copy as well if modes are same:
https://github.com/python-pillow/Pillow/blob/92c26a77ca53a2bfbd8804f009c6c8755d0e5a43/src/PIL/Image.py#L945-L946

pmeier · 2022-03-10T09:53:05Z

The PIL op is even more permissive:

vision/torchvision/transforms/functional_pil.py

Lines 361 to 362 in d0dede0

    
           if num_output_channels == 1: 
        
               img = img.convert("L")

For num_output_channels == 1, it converts an arbitrary color space to grayscale. Maybe this is why the names of the ops are also not aligned:

vision/torchvision/transforms/functional_tensor.py

Line 145 in d0dede0

def rgb_to_grayscale(img: Tensor, num_output_channels: int = 1) -> Tensor:

vision/torchvision/transforms/functional_pil.py

Line 357 in d0dede0

def to_grayscale(img: Image.Image, num_output_channels: int) -> Image.Image:

Thus, aligning both ops is more complicated and requires a design like #5567. I'm not eager to port this back.

Maybe we can only fix this on RandomGrayscale as proposed above?

datumbox · 2022-03-10T11:59:26Z

I think fixing rgb_to_grayscale is more inlined with the way the new API works. Thoughts?

Edit: Sorry the more I look into it, the more second thoughts I got. I understand that the intention of this method is to always return an image that has 3 channels. I suppose this could be very useful when you deal with CNNs that expect a specific size but this is also very inefficient. But at any case, should we make rgb_to_grayscale to behave exactly like PIL? Aka, convert to a 1 colour image then expand to 3 channels?

datumbox · 2022-08-10T13:14:05Z

I think it's a good time to resolve this issue.

I read again what was discussed and my initial feeling is that Ideally I would like to align the two low-level kernels to behave the same (do the no-op) rather than outsourcing this handling to the RandomGrayscale transform. I could be wrong though and the other solution might be preferable for reasons I don't see currently.

@pmeier and @vfdev-5 could you please weight the pros/cons, align and make a proposal?

pmeier · 2022-08-23T10:10:56Z

Both F.to_grayscale and F.rgb_to_grayscale will be deprecated in favor of F.convert_color_space. The implementation is roughly

def rgb_to_grayscale(inpt: Any, num_output_channels: int = 1) -> Any:
    output = convert_color_space(
        inpt,
        color_space=features.ColorSpace.GRAY,
        old_color_space=features.Image.guess_color_space(inpt) if isinstance(inpt, torch.Tensor) else None,
    )
    if num_output_channels == 3:
        output = convert_color_space(
            inpt,
            color_space=features.ColorSpace.RGB,
            old_color_space=features.ColorSpace.GRAY,
        )
    return output

This will align the behavior for PIL and tensor images to be a no-op in case the input is already a grayscale image. That is in line with what F.convert_color_space does:

vision/torchvision/prototype/transforms/functional/_meta.py

Lines 106 to 110 in c6a715c

    
           if new_color_space == old_color_space: 
        
               if copy: 
        
                   return image.clone() 
        
               else: 
        
                   return image

vision/torchvision/prototype/transforms/functional/_meta.py

Lines 162 to 163 in c6a715c

    
           if not copy and image.mode == new_mode: 
        
               return image

In that sense my vote is out to align these deprecated kernels.

I don't have a strong opinion on whether or not we want to backport this to the current stable API. On one hand we are not ready to migrate to the new API meaning we leave a known bug. On the other hand, no one ever complained about it and I only found it while writing the new transform. Given that it is not a silent bug, I would also be ok to leave it.

datumbox · 2022-08-23T11:23:19Z

I think the misalignment between the PIL and Tensor kernel is a bug. We always aimed to ensure that Tensor kernels behave the same as PIL kernels and we have a bunch of unit-tests to ensure that this was the case. I think this fell through the cracks.

I agree we should fix the bug by aligning the kernels. I think the plan is the same but I would phrase it differently and connect it less with the new API:

Align the 2 kernels on F stable. Aka ensure that tensor does a no-op in case it's already grayscale
Adjust the new API to reflect the above behaviour for rgb_to_grayscale(). I agree that the specific method will be deprecated in favour of the more generic convert_color_space() but still worth aligning the two to ensure a smooth BC transition between the APIs.

I think we are fully aligned but let me know if you have any concerns or I missed anything.

pmeier added bug module: transforms labels Mar 10, 2022

pmeier self-assigned this Mar 10, 2022

pmeier mentioned this issue Mar 10, 2022

fix RandomGrayscale for grayscale inputs #5585

Closed

pmeier mentioned this issue Aug 23, 2022

allow grayscale tensor images in rgb_to_grayscale #6474

Merged

pmeier closed this as completed in #6474 Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RandomGrayscale fails for grayscale tensor inputs #5581

RandomGrayscale fails for grayscale tensor inputs #5581

pmeier commented Mar 10, 2022 •

edited by pytorch-bot bot

Loading

datumbox commented Mar 10, 2022

Uh oh!

pmeier commented Mar 10, 2022

Uh oh!

vfdev-5 commented Mar 10, 2022

Uh oh!

pmeier commented Mar 10, 2022

Uh oh!

datumbox commented Mar 10, 2022 •

edited

Loading

Uh oh!

datumbox commented Aug 10, 2022

Uh oh!

pmeier commented Aug 23, 2022

Uh oh!

datumbox commented Aug 23, 2022

Uh oh!

RandomGrayscale fails for grayscale tensor inputs #5581

RandomGrayscale fails for grayscale tensor inputs #5581

Comments

pmeier commented Mar 10, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

datumbox commented Mar 10, 2022

Uh oh!

pmeier commented Mar 10, 2022

Uh oh!

vfdev-5 commented Mar 10, 2022

Uh oh!

pmeier commented Mar 10, 2022

Uh oh!

datumbox commented Mar 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datumbox commented Aug 10, 2022

Uh oh!

pmeier commented Aug 23, 2022

Uh oh!

datumbox commented Aug 23, 2022

Uh oh!

pmeier commented Mar 10, 2022 •

edited by pytorch-bot bot

Loading

datumbox commented Mar 10, 2022 •

edited

Loading