Skip to content

Add PermuteDimensions and TransposeDimensions transforms #6800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 21, 2022

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Oct 20, 2022

Part of #6768

This PR adds the PermuteDimensions and TransposeDimensions transforms which are useful for video training which is often needed to move the temporal dimension around.

Because the end tensors can have their dimensions completely messed, meta-data information from Images and Videos are not retrained and instead a simple tensor is assumed.

cc @vfdev-5 @bjuncek @pmeier

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we need this functionality somehow, but I'm not sure these general transformations are the way to go. They have very generic names and one could reasonable deduce that PermuteDimensions also works for features.BoundingBox'es or the like. Worse, one can even put features.BoundingBox in the permutation dict just for it to be silently ignored.

Basically these transforms suffer from the same fate as ConvertDtype from #6783. We basically need a name for "image or video" that can identify these transforms better. I'm going to paddle one idea that @vfdev-5 started in #6783 (comment) and I already mentioned in our offline conversation: right now we use the term "input" (or inpt) to be exact to mean Union[features._Feature, _is_simple_tensor, PIL.Image.Image]. However, "input" would be a good name for Union[features.Image, features.Video, _is_simple_tensor, PIL.Image.Image] and could clearly separate from "target" or "annotation" like Union[features.Label, features.Mask, features.BoundingBox]. We could change what we currently call inpt to feature. That would free the term "input" and we could use that for ConvertInputDtype, PermuteInputDimensions, ... Of course torch.Tensor's and PIL.Image.Image's are not features._Feature's but they have the same purpose. We could even for feature_like as a name to avoid confusion. WDYT?

Regarding the way forward: do we actually need both transforms or is one of them sufficient? Given that our references currently only need to swap two dimensions

"""Convert tensor from (B, C, H, W) to (C, B, H, W)"""

wouldn't TransposeDimensions be sufficient here? Although it can only swap two dimensions at a time, it has one major upside: you don't have to know the whole shape. For example image.transpose(-1, -2) is valid for arbitrary batch sizes while I need to know the number of dimensions if I use image.permute(...) although I don't need to know.

@datumbox
Copy link
Contributor Author

datumbox commented Oct 21, 2022

@pmeier Let's take the naming discussion out side of this PR because these are transforms needed ASAP to reach parity of functionality for internal use-cases (ClassyVision). I confirm we need PermuteDimensions because often the video can be read in the following formats "THWC", "TCHW", "CTHW". TransposeDimensions is optional but advised as it will allow us to handle arbitrary batch sizes.

I'll move the _get_defaultdict into utils as you suggested, let me know if there are any other blocking comments.

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping

@datumbox datumbox merged commit f88ab12 into pytorch:main Oct 21, 2022
@datumbox datumbox deleted the prototype/permute_transpose branch October 21, 2022 11:41
@datumbox datumbox linked an issue Oct 21, 2022 that may be closed by this pull request
facebook-github-bot pushed a commit that referenced this pull request Oct 21, 2022
)

Summary:
* Add PermuteDimensions and TransposeDimensions transforms

* Strip Subclass info.

* Apply changes from code review.

Reviewed By: YosuaMichael

Differential Revision: D40588167

fbshipit-source-id: 9b661958738317728dc66823292fbf9758055e6b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[prototype] Add temporal Jittering and Permute transforms and kernels
3 participants