Skip to content

[NOMERGE] Review Features prototype #5379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Feb 4, 2022

Adding comments to the features classes.

@facebook-github-bot
Copy link

facebook-github-bot commented Feb 4, 2022

💊 CI failures summary and remediations

As of commit 72d7077 (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build unittest_prototype (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

�[31m============================== �[31m�[1m5 ...0.26s�[0m�[31m ===============================�[0m
    raise TypeError(f"{cls} is not a generic class")
TypeError: <class 'torch.utils.data.datapipes.iter.grouping.ShardingFilterIterDataPipe'> is not a generic class
------ generated xml file: /home/circleci/project/test-results/junit.xml -------
=========================== short test summary info ============================
ERROR test/test_prototype_builtin_datasets.py - TypeError: <class 'torch.util...
ERROR test/test_prototype_datasets_api.py - TypeError: <class 'torch.utils.da...
ERROR test/test_prototype_datasets_utils.py - TypeError: <class 'torch.utils....
ERROR test/test_prototype_models.py - TypeError: <class 'torch.utils.data.dat...
ERROR test/test_prototype_transforms_functional.py - TypeError: <class 'torch...
!!!!!!!!!!!!!!!!!!! Interrupted: 5 errors during collection !!!!!!!!!!!!!!!!!!!!
�[31m============================== �[31m�[1m5 errors�[0m�[31m in 0.26s�[0m�[31m ===============================�[0m


Exited with code exit status 2


1 failure not recognized by patterns:

Job Step Action
CircleCI lint_python_and_config Lint Python code and config files 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@datumbox datumbox marked this pull request as draft February 4, 2022 17:19
@@ -4,3 +4,5 @@
from ._image import ColorSpace, Image
from ._label import Label, OneHotLabel
from ._segmentation_mask import SegmentationMask

# We put lots of effort on Video this half. We will need to figure out video tensors as well in this prototype
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question that we need to answer is if there are video transforms that work with the temporal information. This would require the video feature that always have at least 4 dimensions like (time, channels, height, width). If this is not required, we can simply treat a video as a batch of images and thus use the Image feature for it. cc @bjuncek

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends. Some video-specific transforms can potentially use the temporal dimension but for now those that we support don't. So the API should limit us from supporting such transforms but the low-level kernels must support Videos (I believe that's the case now for the vast majority of them; worth confirming).

Note that Videos will have 5 dimensions if you include the batch. This is why many tensor low-level kernels currently implement the transforms using negative indexes (-1, -2, -3) for C, H, W. It is on the current roadmap of TorchVision to improve video support, so that's something we need to factor in.

cc @vfdev-5 for visibility on which transforms need to be adjusted to support videos.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can link this issue #2583 where it was an attempt to make transforms to support video data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require the video feature that always have at least 4 dimensions

In principle, it's not outlandish to expect video tensors to have (at least) 4 dimensions, with fixed expectation for C,H,W being the last three in order to avoid having to re-implement things with negative indexes.

It depends. Some video-specific transforms can potentially use the temporal dimension but for now those that we support don't. So the API should limit us from supporting such transforms but the low-level kernels must support Videos

I'm not sure I follow. There are video transforms (temporal jittering, subsampling, ...) that we should be able to support. In principle, I wouldn't expect torchvision to support every possible scenario, but it would be helpful if basic transforms for video worked on an arbitrary dimensional (i.e. [..., C, H, W] ) tensor. In that way, if user wants to implement some obscure temporal transform, they can implement something operating on the T dimension (so, -4) in a way that follows common API for a video feature (or batch image feature, whatever will be easiest). Would the current implementation make that troublesome?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjuncek I think we agree on the approach. Having the Videos stored as [..., T, C, H, W] allows us to implement most of the basic transforms with Video support. We should investigate which of the current transforms support this and which don't. Also how about the batch dimension aka [B, T, C, H, W]; any thoughts for handling that on the new API?

I'm not sure I follow.

I just wanted to bring up the fact that it's not a given that ALL transforms will operate independently on the temporal dimension. So on the new API, there might be the need to support eventually a transform that operates on (T, C, H, W) jointly. Just mentioning this so that we don't limit in on the API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a Video feature with [..., T, C, H, W] will give us the most flexibility. All transforms that don't need the temporal information can simply use the corresponding image kernel that expects [..., C, H, W]. At the same time, if we later have transforms that need the temporal information, they are automatically supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants