-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Make transforms work on video tensor inputs or batch of images #2583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The representation |
Yes, I think we should assume that the two last dimensions are H and W. We will be leveraging in the future
My first reaction would be that we should assume that the channels are the Another option (which I would let out for now) is to allow for a adding @takatosp1 @tullie for thoughts |
I feel like the annoying part about this is that the toTensorVideo transforms ffmpeg output |
Everything under |
Your logic for (T, C, H, W) makes sense, particularly for transforms. Problem is - any video model with 3D convolutions is likely going to want T, H, W in the last 3 dims. I guess it depends if you're prioritizing the best format for writing transforms or for writing common video models. |
I think it depends on what are the following "operations" after the transforms. |
@vfdev-5 I believe this can be closed now? |
Ok, let's wait until we update the reference examples then before closing this one |
Close the issue as #2935 has been merged. |
🚀 Feature
Following #2292 (comment) and discussion with @fmassa and @bjuncek , this feature request is to improve the transforms module to support video inputs as
torch.Tensor
of shape(C, T, H, W)
, where C - number of image channels (e.g. 3 for RGB), T - number of frames, H, W - image dimensions.Points to discuss:
(C, T, H, W)
?(T, C, H, W)
?The text was updated successfully, but these errors were encountered: