-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Add PermuteDimensions and TransposeDimensions transforms #6800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we need this functionality somehow, but I'm not sure these general transformations are the way to go. They have very generic names and one could reasonable deduce that PermuteDimensions
also works for features.BoundingBox
'es or the like. Worse, one can even put features.BoundingBox
in the permutation dict just for it to be silently ignored.
Basically these transforms suffer from the same fate as ConvertDtype
from #6783. We basically need a name for "image or video" that can identify these transforms better. I'm going to paddle one idea that @vfdev-5 started in #6783 (comment) and I already mentioned in our offline conversation: right now we use the term "input" (or inpt
) to be exact to mean Union[features._Feature, _is_simple_tensor, PIL.Image.Image]
. However, "input" would be a good name for Union[features.Image, features.Video, _is_simple_tensor, PIL.Image.Image]
and could clearly separate from "target" or "annotation" like Union[features.Label, features.Mask, features.BoundingBox]
. We could change what we currently call inpt
to feature
. That would free the term "input" and we could use that for ConvertInputDtype
, PermuteInputDimensions
, ... Of course torch.Tensor
's and PIL.Image.Image
's are not features._Feature
's but they have the same purpose. We could even for feature_like
as a name to avoid confusion. WDYT?
Regarding the way forward: do we actually need both transforms or is one of them sufficient? Given that our references currently only need to swap two dimensions
"""Convert tensor from (B, C, H, W) to (C, B, H, W)""" |
wouldn't TransposeDimensions
be sufficient here? Although it can only swap two dimensions at a time, it has one major upside: you don't have to know the whole shape. For example image.transpose(-1, -2)
is valid for arbitrary batch sizes while I need to know the number of dimensions if I use image.permute(...)
although I don't need to know.
@pmeier Let's take the naming discussion out side of this PR because these are transforms needed ASAP to reach parity of functionality for internal use-cases (ClassyVision). I confirm we need I'll move the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stamping
Part of #6768
This PR adds the
PermuteDimensions
andTransposeDimensions
transforms which are useful for video training which is often needed to move the temporal dimension around.Because the end tensors can have their dimensions completely messed, meta-data information from Images and Videos are not retrained and instead a simple tensor is assumed.
cc @vfdev-5 @bjuncek @pmeier