-
Notifications
You must be signed in to change notification settings - Fork 7.1k
RandomGrayscale fails for grayscale tensor inputs #5581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@pmeier I agree that's a problem. I think we should update |
I would, yes. Given that the op currently won't return the input tensor, users might rely on this. Given that no one raised an issue (that I know of) about this before, I'm guessing this an edge case anyway. Thus, the potential extra copy shouldn't be a problem here. |
PIL does a copy as well if modes are same: |
The PIL op is even more permissive: vision/torchvision/transforms/functional_pil.py Lines 361 to 362 in d0dede0
For
Thus, aligning both ops is more complicated and requires a design like #5567. I'm not eager to port this back. Maybe we can only fix this on |
I think fixing Edit: Sorry the more I look into it, the more second thoughts I got. I understand that the intention of this method is to always return an image that has 3 channels. I suppose this could be very useful when you deal with CNNs that expect a specific size but this is also very inefficient. But at any case, should we make |
I think it's a good time to resolve this issue. I read again what was discussed and my initial feeling is that Ideally I would like to align the two low-level kernels to behave the same (do the no-op) rather than outsourcing this handling to the RandomGrayscale transform. I could be wrong though and the other solution might be preferable for reasons I don't see currently. @pmeier and @vfdev-5 could you please weight the pros/cons, align and make a proposal? |
Both def rgb_to_grayscale(inpt: Any, num_output_channels: int = 1) -> Any:
output = convert_color_space(
inpt,
color_space=features.ColorSpace.GRAY,
old_color_space=features.Image.guess_color_space(inpt) if isinstance(inpt, torch.Tensor) else None,
)
if num_output_channels == 3:
output = convert_color_space(
inpt,
color_space=features.ColorSpace.RGB,
old_color_space=features.ColorSpace.GRAY,
)
return output This will align the behavior for PIL and tensor images to be a no-op in case the input is already a grayscale image. That is in line with what vision/torchvision/prototype/transforms/functional/_meta.py Lines 106 to 110 in c6a715c
vision/torchvision/prototype/transforms/functional/_meta.py Lines 162 to 163 in c6a715c
In that sense my vote is out to align these deprecated kernels. I don't have a strong opinion on whether or not we want to backport this to the current stable API. On one hand we are not ready to migrate to the new API meaning we leave a known bug. On the other hand, no one ever complained about it and I only found it while writing the new transform. Given that it is not a silent bug, I would also be ok to leave it. |
I think the misalignment between the PIL and Tensor kernel is a bug. We always aimed to ensure that Tensor kernels behave the same as PIL kernels and we have a bunch of unit-tests to ensure that this was the case. I think this fell through the cracks. I agree we should fix the bug by aligning the kernels. I think the plan is the same but I would phrase it differently and connect it less with the new API:
I think we are fully aligned but let me know if you have any concerns or I missed anything. |
Uh oh!
There was an error while loading. Please reload this page.
#1505 added
rgb_to_grayscale
to tensor images and #2586 aligned it with the PIL op. It was missed that the PIL op is a no-op if the input is already grayscale, whereas the tensor op fails:As the name as well as the docstring
vision/torchvision/transforms/functional.py
Lines 1231 to 1232 in d0dede0
imply, this transform should only handle RGB inputs so that shouldn't be an issue.
The problem is that the
RandomGrayscale
transform relies on the no-op behavior:vision/torchvision/transforms/transforms.py
Lines 1613 to 1616 in d0dede0
Note the docstring is conflicting here:
vision/torchvision/transforms/transforms.py
Lines 1586 to 1587 in d0dede0
vision/torchvision/transforms/transforms.py
Lines 1595 to 1596 in d0dede0
BC compatible fix would be to patch
RandomGrayscale
to be an explicit no-op for grayscale inputs.cc @vfdev-5 @datumbox
The text was updated successfully, but these errors were encountered: