-
Notifications
You must be signed in to change notification settings - Fork 7.1k
add gallery for transforms v2 #7331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
""" | ||
================================== | ||
Getting started with transforms v2 | ||
================================== | ||
|
||
Most computer vision tasks are not supported out of the box by ``torchvision.transforms`` v1, since it only supports | ||
images. ``torchvision.transforms.v2`` enables jointly transforming images, videos, bounding boxes, and masks. This | ||
example showcases the core functionality of the new ``torchvision.transforms.v2`` API. | ||
""" | ||
|
||
import pathlib | ||
|
||
import torch | ||
import torchvision | ||
|
||
|
||
def load_data(): | ||
from torchvision.io import read_image | ||
from torchvision import datapoints | ||
from torchvision.ops import masks_to_boxes | ||
|
||
assets_directory = pathlib.Path("assets") | ||
|
||
path = assets_directory / "FudanPed00054.png" | ||
image = datapoints.Image(read_image(str(path))) | ||
merged_masks = read_image(str(assets_directory / "FudanPed00054_mask.png")) | ||
|
||
labels = torch.unique(merged_masks)[1:] | ||
|
||
masks = datapoints.Mask(merged_masks == labels.view(-1, 1, 1)) | ||
|
||
bounding_boxes = datapoints.BoundingBox( | ||
masks_to_boxes(masks), format=datapoints.BoundingBoxFormat.XYXY, spatial_size=image.shape[-2:] | ||
) | ||
|
||
return path, image, bounding_boxes, masks, labels | ||
|
||
|
||
######################################################################################################################## | ||
# The :mod:`torchvision.transforms.v2` API supports images, videos, bounding boxes, and instance and segmentation | ||
# masks. Thus, it offers native support for many Computer Vision tasks, like image and video classification, object | ||
# detection or instance and semantic segmentation. Still, the interface is the same, making | ||
# :mod:`torchvision.transforms.v2` a drop-in replacement for the existing :mod:`torchvision.transforms` API, aka v1. | ||
|
||
# We are using BETA APIs, so we deactivate the associated warning, thereby acknowledging that | ||
# some APIs may slightly change in the future | ||
torchvision.disable_beta_transforms_warning() | ||
import torchvision.transforms.v2 as transforms | ||
|
||
transform = transforms.Compose( | ||
[ | ||
transforms.ColorJitter(contrast=0.5), | ||
transforms.RandomRotation(30), | ||
transforms.CenterCrop(480), | ||
] | ||
) | ||
|
||
######################################################################################################################## | ||
# :mod:`torchvision.transforms.v2` natively supports jointly transforming multiple inputs while making sure that | ||
# potential random behavior is consistent across all inputs. However, it doesn't enforce a specific input structure or | ||
# order. | ||
|
||
path, image, bounding_boxes, masks, labels = load_data() | ||
|
||
torch.manual_seed(0) | ||
new_image = transform(image) # Image Classification | ||
new_image, new_bounding_boxes, new_labels = transform(image, bounding_boxes, labels) # Object Detection | ||
new_image, new_bounding_boxes, new_masks, new_labels = transform( | ||
image, bounding_boxes, masks, labels | ||
) # Instance Segmentation | ||
new_image, new_target = transform((image, {"boxes": bounding_boxes, "labels": labels})) # Arbitrary Structure | ||
|
||
######################################################################################################################## | ||
# Under the hood, :mod:`torchvision.transforms.v2` relies on :mod:`torchvision.datapoints` for the dispatch to the | ||
# appropriate function for the input data: :ref:`sphx_glr_auto_examples_plot_datapoints.py`. Note however, that as | ||
# regular user, you likely don't have to touch this yourself. See | ||
# :ref:`sphx_glr_auto_examples_plot_transforms_v2_e2e.py`. | ||
# | ||
# All "foreign" types like :class:`str`'s or :class:`pathlib.Path`'s are passed through, allowing to store extra | ||
# information directly with the sample: | ||
|
||
sample = {"path": path, "image": image} | ||
new_sample = transform(sample) | ||
|
||
assert new_sample["path"] is sample["path"] | ||
|
||
######################################################################################################################## | ||
# As stated above, :mod:`torchvision.transforms.v2` is a drop-in replacement for :mod:`torchvision.transforms` and thus | ||
# also supports transforming plain :class:`torch.Tensor`'s as image or video if applicable. This is achieved with a | ||
# simple heuristic: | ||
# | ||
# * If we find an explicit image or video (:class:`torchvision.datapoints.Image`, :class:`torchvision.datapoints.Video`, | ||
# or :class:`PIL.Image.Image`) in the input, all other plain tensors are passed through. | ||
# * If there is no explicit image or video, only the first plain :class:`torch.Tensor` will be transformed as image or | ||
# video, while all others will be passed through. | ||
|
||
plain_tensor_image = torch.rand(image.shape) | ||
|
||
print(image.shape, plain_tensor_image.shape) | ||
|
||
# passing a plain tensor together with an explicit image, will not transform the former | ||
plain_tensor_image, image = transform(plain_tensor_image, image) | ||
|
||
print(image.shape, plain_tensor_image.shape) | ||
|
||
# passing a plain tensor without an explicit image, will transform the former | ||
plain_tensor_image, _ = transform(plain_tensor_image, bounding_boxes) | ||
|
||
print(image.shape, plain_tensor_image.shape) | ||
Comment on lines
+87
to
+109
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder whether we should actually document the heuristic, since it's a potentially controversial part of the contract right now. I feel like it'd be fine to just document that "a single plain tensor is treated as an image for full BC", and leave out the rest. But not strong opinion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Basically, because this is an introductory example I kinda fear that users will think "oh boy, I have to understand all this, and this looks complicated" when in reality 99% of users don't have to worry about this behaviour at all There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's merge, and we'll try to think of a better way to still document this |
Uh oh!
There was an error while loading. Please reload this page.