Skip to content

RandomGrayscale fails for grayscale tensor inputs #5581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pmeier opened this issue Mar 10, 2022 · 8 comments · Fixed by #6474
Closed

RandomGrayscale fails for grayscale tensor inputs #5581

pmeier opened this issue Mar 10, 2022 · 8 comments · Fixed by #6474

Comments

@pmeier
Copy link
Collaborator

pmeier commented Mar 10, 2022

#1505 added rgb_to_grayscale to tensor images and #2586 aligned it with the PIL op. It was missed that the PIL op is a no-op if the input is already grayscale, whereas the tensor op fails:

>>> image_tensor = torch.rand(1, 16, 16)
>>> F.rgb_to_grayscale(image_tensor)
TypeError: Input image tensor permitted channel values are [3], but found 1
>>> image_pil = F.to_pil_image(image_tensor)
>>> F.rgb_to_grayscale(image_pil)
<PIL.Image.Image image mode=L size=16x16 at 0x7F8181384A10>

As the name as well as the docstring

If the image is torch Tensor, it is expected
to have [..., 3, H, W] shape, where ... means an arbitrary number of leading dimensions

imply, this transform should only handle RGB inputs so that shouldn't be an issue.

The problem is that the RandomGrayscale transform relies on the no-op behavior:

num_output_channels, _, _ = F.get_dimensions(img)
if torch.rand(1) < self.p:
return F.rgb_to_grayscale(img, num_output_channels=num_output_channels)
return img

>>> transform = transforms.RandomGrayscale(p=1.0)
>>> transform(image_tensor)
TypeError: Input image tensor permitted channel values are [3], but found 1
>>> transform(image_pil)
<PIL.Image.Image image mode=L size=16x16 at 0x7F23E5B4E850>

Note the docstring is conflicting here:

If the image is torch Tensor, it is expected
to have [..., 3, H, W] shape, where ... means an arbitrary number of leading dimensions

- If input image is 1 channel: grayscale version is 1 channel
- If input image is 3 channel: grayscale version is 3 channel with r == g == b

BC compatible fix would be to patch RandomGrayscale to be an explicit no-op for grayscale inputs.

cc @vfdev-5 @datumbox

@datumbox
Copy link
Contributor

@pmeier I agree that's a problem. I think we should update rgb_to_grayscale for tensors to be no-op if a grayscale image is provided. Are we going to copy or not?

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 10, 2022

Are we going to copy or not?

I would, yes. Given that the op currently won't return the input tensor, users might rely on this. Given that no one raised an issue (that I know of) about this before, I'm guessing this an edge case anyway. Thus, the potential extra copy shouldn't be a problem here.

@pmeier pmeier self-assigned this Mar 10, 2022
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 10, 2022

PIL does a copy as well if modes are same:
https://github.com/python-pillow/Pillow/blob/92c26a77ca53a2bfbd8804f009c6c8755d0e5a43/src/PIL/Image.py#L945-L946

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 10, 2022

The PIL op is even more permissive:

if num_output_channels == 1:
img = img.convert("L")

For num_output_channels == 1, it converts an arbitrary color space to grayscale. Maybe this is why the names of the ops are also not aligned:

def rgb_to_grayscale(img: Tensor, num_output_channels: int = 1) -> Tensor:

def to_grayscale(img: Image.Image, num_output_channels: int) -> Image.Image:

Thus, aligning both ops is more complicated and requires a design like #5567. I'm not eager to port this back.

Maybe we can only fix this on RandomGrayscale as proposed above?

@datumbox
Copy link
Contributor

datumbox commented Mar 10, 2022

I think fixing rgb_to_grayscale is more inlined with the way the new API works. Thoughts?

Edit: Sorry the more I look into it, the more second thoughts I got. I understand that the intention of this method is to always return an image that has 3 channels. I suppose this could be very useful when you deal with CNNs that expect a specific size but this is also very inefficient. But at any case, should we make rgb_to_grayscale to behave exactly like PIL? Aka, convert to a 1 colour image then expand to 3 channels?

@datumbox
Copy link
Contributor

I think it's a good time to resolve this issue.

I read again what was discussed and my initial feeling is that Ideally I would like to align the two low-level kernels to behave the same (do the no-op) rather than outsourcing this handling to the RandomGrayscale transform. I could be wrong though and the other solution might be preferable for reasons I don't see currently.

@pmeier and @vfdev-5 could you please weight the pros/cons, align and make a proposal?

@pmeier
Copy link
Collaborator Author

pmeier commented Aug 23, 2022

Both F.to_grayscale and F.rgb_to_grayscale will be deprecated in favor of F.convert_color_space. The implementation is roughly

def rgb_to_grayscale(inpt: Any, num_output_channels: int = 1) -> Any:
    output = convert_color_space(
        inpt,
        color_space=features.ColorSpace.GRAY,
        old_color_space=features.Image.guess_color_space(inpt) if isinstance(inpt, torch.Tensor) else None,
    )
    if num_output_channels == 3:
        output = convert_color_space(
            inpt,
            color_space=features.ColorSpace.RGB,
            old_color_space=features.ColorSpace.GRAY,
        )
    return output

This will align the behavior for PIL and tensor images to be a no-op in case the input is already a grayscale image. That is in line with what F.convert_color_space does:

if new_color_space == old_color_space:
if copy:
return image.clone()
else:
return image

if not copy and image.mode == new_mode:
return image

In that sense my vote is out to align these deprecated kernels.

I don't have a strong opinion on whether or not we want to backport this to the current stable API. On one hand we are not ready to migrate to the new API meaning we leave a known bug. On the other hand, no one ever complained about it and I only found it while writing the new transform. Given that it is not a silent bug, I would also be ok to leave it.

@datumbox
Copy link
Contributor

I think the misalignment between the PIL and Tensor kernel is a bug. We always aimed to ensure that Tensor kernels behave the same as PIL kernels and we have a bunch of unit-tests to ensure that this was the case. I think this fell through the cracks.

I agree we should fix the bug by aligning the kernels. I think the plan is the same but I would phrase it differently and connect it less with the new API:

  • Align the 2 kernels on F stable. Aka ensure that tensor does a no-op in case it's already grayscale
  • Adjust the new API to reflect the above behaviour for rgb_to_grayscale(). I agree that the specific method will be deprecated in favour of the more generic convert_color_space() but still worth aligning the two to ensure a smooth BC transition between the APIs.

I think we are fully aligned but let me know if you have any concerns or I missed anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants