Skip to content

add support for BoundinBox and SegmentationMask to RandomResizeCrop #6041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented May 18, 2022

Kernels were already there, so this only adds them to the transform.

Test failures are real though. @datumbox we get bitten now but what I mentioned in #5487 (comment): although the transformation is perfectly able to work with only bounding boxes or segmentation masks, our choice of only extracting the image size from images, forces the user to always include an image in the inputs.

@datumbox
Copy link
Contributor

@pmeier As discussed previously, the example you provide at #5487 (comment) is not a valid real-world use-case. Aka resizing just the bbox without the image will lead to corrupted information. Hence requiring always to have the image in such transforms makes sense. If you manage to find another valid counter example, I'm happy to revise. But unless we have that, there is no need to create a solution that covers for a corner-case that is not needed.

Concerning the failure, I think it's a library is missing:

ImportError: libcrypto.so.10: cannot open shared object file: No such file or directory

A potentially easy fix is to install the dependencies that probably are missing. See here.

@pmeier
Copy link
Collaborator Author

pmeier commented May 18, 2022

is not a valid real-world use-case. Aka resizing just the bbox without the image will lead to corrupted information.

I agree for RandomResizeCrop there is probably no use case for bounding boxes only. I was thinking about primitive transforms used in composite transforms. Imagine a scenario like

def _transform(self, input: Any, params: Dict[str, Any]) -> Any:
transform = Pad(**params, padding_mode="constant")
return transform(input)

If Pad would not support bounding boxes individually, we couldn't use it the way we do now. I'm aware that this composite approach has other issues as well, but it is the best we got right now.

Concerning the failure, I think it's a library is missing:

Err, that is not what I meant. This is a torchdata failure hiding the actual failures. I didn't have the latest torchdata version locally and wrote the comment before CI was finished 😇

@datumbox
Copy link
Contributor

If Pad would not support bounding boxes individually, we couldn't use it the way we do now. I'm aware that this composite approach has other issues as well, but it is the best we got right now.

It's worth noting that you are using Pad like that to circumvent a design limitation of the API. You are not using it because Padding a bbox on its own makes sense but rather as a workaround. I think all these are indications that the proposal can be improved. My concern about the use of "primitive transforms" is that I don't know exactly what's their difference with the kernels rather than adding dispatching. Perhaps the whole thing can be solved by moving the dispatch on _Feature objects.

@pmeier
Copy link
Collaborator Author

pmeier commented May 18, 2022

Re test failures: the missing dependency only happens on Linux. I've opened pytorch/data#418 to check if this is a mistake on their side. The real failures are visible on Windows and macOS.

@datumbox
Copy link
Contributor

@pmeier Thanks, yes I've seen it. I don't remember all the details TBH but I think you could go around the issue by dispatching to the low-level kernel with our normal ifs dispatch approach, right?

@@ -20,7 +20,7 @@ def fn(
try:
return next(query_recursively(fn, sample))[1]
except StopIteration:
raise TypeError("No image was found in the sample")
raise TypeError("No image was found in the sample") from None
Copy link
Collaborator

@vfdev-5 vfdev-5 May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you add "from None" here ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to add a comment. The StopIteration is an internal detail that does not need to be propagated outside. By adding from None the error will look like a normal exception

try:
    next(iter([]))
except StopIteration:
    raise TypeError("Argh!") from None
TypeError: Argh!

Not doing that gives

try:
    next(iter([]))
except StopIteration:
    raise TypeError("Argh!")
StopIteration

During handling of the above exception, another exception occurred:

[...]
TypeError: Argh!

@pmeier
Copy link
Collaborator Author

pmeier commented May 18, 2022

you could go around the issue by dispatching to the low-level kernel with our normal ifs dispatch approach, right?

Not without duplicating all the other logic that happens in Pad._transform, which is quite a lot:

if isinstance(input, features.Image) or is_simple_tensor(input):
# PyTorch's pad supports only integers on fill. So we need to overwrite the colour
output = F.pad_image_tensor(input, params["padding"], fill=0, padding_mode="constant")
left, top, right, bottom = params["padding"]
fill = torch.tensor(params["fill"], dtype=input.dtype, device=input.device).to().view(-1, 1, 1)
if top > 0:
output[..., :top, :] = fill
if left > 0:
output[..., :, :left] = fill
if bottom > 0:
output[..., -bottom:, :] = fill
if right > 0:
output[..., :, -right:] = fill
if isinstance(input, features.Image):
output = features.Image.new_like(input, output)
return output
elif isinstance(input, PIL.Image.Image):
return F.pad_image_pil(
input,
params["padding"],
fill=tuple(int(v) if input.mode != "F" else v for v in params["fill"]),
padding_mode="constant",
)
elif isinstance(input, features.BoundingBox):
output = F.pad_bounding_box(input, params["padding"], format=input.format)
left, top, right, bottom = params["padding"]
height, width = input.image_size
height += top + bottom
width += left + right
return features.BoundingBox.new_like(input, output, image_size=(height, width))
else:
return input

It's worth noting that you are using Pad like that to circumvent a design limitation of the API. [...] I think all these are indications that the proposal can be improved.

True. Let's postpone this discussion until we have decided on a general way to handle composite transforms. If we for example reinstate the high-level dispatchers, my argument is moot. Otherwise it might still apply.

@datumbox datumbox closed this Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants