-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[proto] Reduce number of calls of __torch_function__ #6681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[proto] Reduce number of calls of __torch_function__ #6681
Conversation
# this way we return the result without passing into __torch_function__ | ||
@property | ||
def shape(self): | ||
return self.as_subclass(torch.Tensor).shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we tried maintaining the original reference of the input tensor data
prior getting wrapped in the subclassed tensor? @ezyang proposed this idea offline as a means to speed things up. He just cautioned to keep this private because if a user gets hold of the original reference and do original.resize_()
this change will be available only on that tensor and not on the wrapped one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really faster than going into torch function? You have to do a tensor allocation here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we are only using .as_subclass(torch.Tensor)
here to avoid __torch_function__
, correct? If yes, why not simply use the context manager that disables this?
vision/torchvision/prototype/features/_feature.py
Lines 102 to 103 in 07ae61b
with DisableTorchFunction(): | |
output = func(*args, **kwargs or dict()) |
like
with DisableTorchFunction():
return self.shape
Better yet, instead of adding properties for all these attributes manually, why not overwrite __getattribute__
to do this for us?
_NO_TORCH_FUNCTION = {
"shape",
}
def __getattribute__(self, item):
with DisableTorchFunction() if item in type(self)._NO_TORCH_FUNCTION else contextlib.nullcontext():
return object.__getattribute__(self, item)
Test:
import unittest.mock
import torch
from torchvision.prototype import features
image = features.Image(torch.rand(3, 16, 16))
with unittest.mock.patch(
"torchvision.prototype.features._Feature.__torch_function__",
side_effect=AssertionError,
):
image.shape
I have not benchmarked this. If it is as fast or faster than your suggestion, I would prefer this for several reasons:
- We don't need to add properties manual, which is harder to maintain.
- Subclasses can extend
_NO_TORCH_FUNCTION
or whatever we call it to also avoid__torch_function__
calls on the metadata they are adding, i.e.Image.color_space
. - Since we don't need to unwrap anymore, we can also avoid the issues @datumbox mentioned in [proto] Reduce number of calls of __torch_function__ #6681 (comment).
# Question: Is it safe to assume data to be a tensor ? | ||
out = data.as_subclass(Image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. features.Label(0)
is a good counter example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is inside Image
, Label wont pass here IMO
Yes, it is to disable torch function. I can benchmark but I assume that using the context manager will be slower as we should be calling more functions (e.g. enter/exit of CM). Let's see. |
Here are some time measurements for different options: import torch
from torch._C import DisableTorchFunction
class ATensor(torch.Tensor):
def __new__(
cls,
data,
*,
dtype=None,
device=None,
requires_grad: bool = False,
):
data = torch.as_tensor( # type: ignore[return-value]
data,
dtype=dtype, # type: ignore[arg-type]
device=device, # type: ignore[arg-type]
)
output = data.as_subclass(cls).requires_grad_(requires_grad)
output._data_tensor = data
return output
@classmethod
def __torch_function__(cls, func, types, args=(), kwargs=None):
if kwargs is None:
kwargs = {}
return super().__torch_function__(func, types, args, kwargs)
@property
def shape2(self):
return self.as_subclass(torch.Tensor).shape
@property
def shape3(self):
return self._data_tensor.shape
@property
def shape4(self):
with DisableTorchFunction():
return self.shape
a = ATensor(torch.tensor([0, 1, 2]))
The fastest is using a ref to original torch tensor and the second fastest is |
@vfdev-5 Sounds good. Let's proceed with Looking forward to seeing what impact this has on the overall performance. |
Overall impact of such changes is about ~0.1 ms vs Using ref is a bit faster that using
However, there are few blockers/issues in the implementation (not yet pushed here)... WIP |
By the way, the core team is open to optimizing the relevant apis to make them faster. We didn't really do any benchmarking on them. One easy win is to have subclasses preregister the set of |
@ezyang yes, this could be helpful ! |
@ezyang Thanks for offering help. Absolutely, we would love to work with you to make the solution faster. Perhaps we can start by a quick review on our practices (I believe this PR is a good place to start), to let us know if we make improper use of some of the Subclassing features and whether there is any low hanging fruits we can improve. Happy to chat more about dedicating some time to improve the solution. |
The lines edited by this PR don't look obviously wrong but I haven't looked at the whole thing. @vfdev-5 are you open to implementing some of the core optimizations? We're pretty low bandwidth on our side, but I can definitely find time to give design and review |
@ezyang we are right now very tight on the schedule for v2, but yes, I can be open to implement some core optimization a bit later once we are done with python optims. Thanks for proposing, let me ping you once we have time to put an effort there. |
After merging #6718 perf improvements are less important in terms of time measurements (~ 0.01 ms), however we can reduce the number of Main: |
@@ -16,7 +16,9 @@ | |||
class EncodedData(_Feature): | |||
@classmethod | |||
def _wrap(cls: Type[D], tensor: torch.Tensor) -> D: | |||
return tensor.as_subclass(cls) | |||
output = tensor.as_subclass(cls) | |||
output._tensor = tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally you had concerns about adopting this approach in order to avoid issues with in-place operators. Assuming we keep this reference private, are we safe from in-place operators? Aka do the shapes of img
and img._tensor
match on the following?
x = torch.randn((10,1,1,1))
img = Image(x)
img.resize_((10, 1, 10, 10))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we are not safe with resize_
-like ops and the shapes wont match...
This is a drawback of using tensor ref...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then probably that's something we would like to avoid, right? Shall we proceed with your original proposal (#6681 (comment)) to use DisableTorchFunction
? My understanding is that this would be safe for inplace ops with little performance penalty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be better to avoid that. I'll check DisableTorchFunction again but seeing that the gain is very tiny and around benchmarking noise maybe we can search for other optims now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we are not safe with
resize_
-like ops and the shapes wont match...
This is a drawback of using tensor ref...
IIUC, this should not be that much of an issue. We are already special casing inplace ops:
vision/torchvision/prototype/features/_feature.py
Lines 106 to 111 in 11a2eed
# Inplace `func`'s, canonically identified with a trailing underscore in their name like `.add_(...)`, | |
# will retain the input type. Thus, we need to unwrap here. | |
if isinstance(output, cls): | |
return output.as_subclass(torch.Tensor) | |
return output |
Can't we simply correct the _tensor
reference there? Like
if isinstance(output, cls):
output._tensor = output = output.as_subclass(torch.Tensor)
If that is too much syntax sugar, the more verbose variant would be
if isinstance(output, cls):
tensor = output.as_subclass(torch.Tensor)
output._tensor = tensor
return tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to it if we can guarantee it's safe. If we go ahead with keeping a reference, is it possible to do a bit of refactoring to use __
and make it much harder for users to shoot themselves in the foot? I suspect the issue here is subclassing of our _Feature
but I'm open to ideas.
@vfdev-5 WDYT of Philip's proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I check that later, but as commented previously, I do not see that this update brings much perf gain, just very tiny speed-up. Let me come back to this PR later. I'll put it as draft now.
Merge branch 'main' of github.com:pytorch/vision into proto-perf-feature-improvements
30c1ae9
to
3799ce7
Compare
With - 58336 0.147 0.000 0.513 0.000 /vision/torchvision/prototype/features/_feature.py:62(__torch_function__)
+ 16747 0.070 0.000 0.376 0.000 /vision/torchvision/prototype/features/_feature.py:64(__torch_function__) Time diffs for classification pipelines, one example: [ Classification AA=ra RE=1.0 transforms measurements ]
| v2
1 threads: -----------------------
Tensor Image data | 3.803
- Feature Image data | 3.986
+ Feature Image data | 3.931 |
This reverts commit 38f8e21.
8348759
to
99c3594
Compare
Results for [ Classification AA=augmix RE=1.0 transforms measurements ]
| v2
1 threads: -----------------------
- Tensor Image data | 7.846
- Feature Image data | 7.916
+ Tensor Image data | 7.798 <---- this is quicker than the same test <-> noise
+ Feature Image data | 7.875 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit inline. Otherwise LGTM, thanks Victor!
One question though for my own understanding: we don't need this for the feature specific attributes like Image.color_space
since they are not part of __torch_function__
anyway, right?
Co-authored-by: Philip Meier <[email protected]>
Yes, seems like additional attributes like color_space are not routed to |
Hey @vfdev-5! You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py |
Summary: * [proto] Reduce number of calls of __torch_function__ * Use DisableTorchFunction and super * Use self._tensor * Fixes mypy and color space handling * revert Image.new_like * WIP * Perf opt with ref to tensor and properties * Removed requires_grad property * Use _tensor ref * Revert "Use _tensor ref" This reverts commit 38f8e21. * Update torchvision/prototype/features/_feature.py Reviewed By: NicolasHug Differential Revision: D40427451 fbshipit-source-id: f241b2d48823d3612943b79887da2c8d1f482160 Co-authored-by: Philip Meier <[email protected]> Co-authored-by: Philip Meier <[email protected]>
An attempt to improve API v2 performances by reducing the number of
_Feature.__torch_function__
calls.CProfiling classification AA(imagenet) + RandomErasing (code) :
Main:
This PR:
Time measurements on all classification dataaug pipelines. Improvements are ~1ms:
cc @pmeier @datumbox