-
Notifications
You must be signed in to change notification settings - Fork 7.1k
API usage logging within TorchVision #5052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the write up @kazhang. I agree with the direction of the proposal. Personally I would favour As discussed at #5038 (comment), adopting a straightforward, easy to implement and review solution will help us ensure that logging will be added on new endpoints and models. Already we've missed it in a couple of places (see #5044, admittedly I was involved on the reviews and missed this). It might be good to include a few clarifications on how the proposal is implemented in practice, so that open-source contributors have clearer understanding of the policy. For example:
Concerning the outstanding work, my understanding is that:
@kazhang would you be OK to open a ticket for |
In terms of format, I personally prefer
So that the API is consistent across module and function(as of today, module uses |
@kazhang After reviewing some of the PRs, I noticed that:
First of all we should note that some of the inconsistency preexisted your PRs/proposal. Moreover your proposal to log each method under the flat namespace does simplify things and at first glance looks very appealing. It also avoids recording very long fully qualified names such as cc @NicolasHug |
Regarding Python, I think we could try to record the up-most subpackage (with an For example we explicitly expose Similarly, since |
For Python, recording the most compact valid name path of the method/class is reasonable. If that was the case, would you make _log_api_usage_once("models.detection", "RetinaNet")
# VS
_log_api_usage_once("torchvision.models.detection.RetinaNet") Also how about the C++ side? Using the fully qualified name such as It's probably worth clarifying these details fully prior making more changes to the code to minimize throwaway work. Let me know your thoughts, thanks! |
Agree with this if we're sure to only ever log functions that we register through the dispatcher. Otherwise, we still need a convention for logging the non-registered functions, in which case I would go with this:
It's a bit cumbersome I agree. But it also seems to be the most straightforward, dumb simple, and name-clash free strategy. We can do both BTW: use the first strategy for registered functions, and use the other one as a fall-back |
@NicolasHug @datumbox
Because the model inherit from this class won't be tracked properly, instead, we should use
Now, if we want to record the up-most subpackage, we can do
But if there is another module inheriting from RetinaNet but not under |
I want to propose this:
For Python methods, we call:
Pros:
Cons:
@NicolasHug @datumbox thoughts? |
Thanks to both of you for their input. As @kazhang mentioned, some models (like Quantized) use inheritance and hardcoding the module name or the class name won't work. So I agree the following should work for classes:
For methods, we can use the same approach but instead of
As far as I can tell, this works, the tests pass and JIT doesn't complain but you should verify prior replacing everything. For C++, I'm OK using the fully qualified name: What I like about this proposal, is that it's easy to write and easy to verify. Kai's list of pros checks all of the requirements that we have, so I think this should be OK. |
I'm not sure I fully understand the inheritance problem. IIUC if B inherits from A, and if we log in both But then logging |
@NicolasHug I don't think there is an inheritance problem. I think that Kai just wanted to warn against hardcoding class names in favour of using The corner case that Kai is talking about is related to how Quantized models are structured; in a nutshell the Quantized models inherit from the non-quantized. We could put the logger in both constructors if we think that this simplifies our policy but as long as we don't hardcode strings, the metrics will be fine without dividing by 2. Based on the documentation, "the callback fires only once for a given process for each of the APIs". Plus the calls should be further deduped later in the stats pipelines, so I think we should be OK. |
ah, thanks for the clarification! my apologies for the confusions If everyone agree, let's move ahead with logging the fully qualified name:
|
I also want to clarify another point, whether we should track modules that inherit from torchvision but don't live in torchvision, for example, classy vision could inherit from the I personally prefer not to log that kind of events because we can't really make use of those events: our pipeline currently only looks at events start with "torchvision". Therefore I recommend to add a explicit filter inside |
For the completeness, I also considered a simplified logging API
The API call would be as simple as
However, TorchScript doesn't like the idea
see #5095 |
In 3c97c20e3bfa34, I found out that TorchScript doesn't allow to access
In this case, I would rather we use single string as the param for
For class, it could be
|
It would be very nice to be able to have simple API as the one you proposed below:
I modified slightly your code at #5095:
JIT tests seem to pass. |
Thanks @datumbox , basically we would need to check if is torchscripting for every function that needs to be complaint with TorchScript. Maybe it's better than writing the full qualified name like #5096
For methods
For methods that need to be complaint with TorchScript:
|
@kazhang Thanks for the changes. Given that on the #5007 we didn't add calls for transforms such as |
Uh oh!
There was an error while loading. Please reload this page.
Goal
To understand TorchVision usage within an organization(e.g. Meta).
The events give insights into torchvision usage with regards to individual callsites, workflows etc. The organization could also learn the trending APIs, which could be used to guide component development/deprecation etc.
Policy
__init__
ofRegNet
, but not on ones of submodules(e.g.ResBottleneckBlock
)torchvision.io
, the logging must be added both on the Python and the C++ (using the csrc submodule as mentioned) side.torchvision.ops
, the calls should be added both on the main class of the operator (eg StochasticDepth) and on its functional equivalent (eg stochastic_depth) if available.torchvision.transforms
, the calls should be placed on the constructors of the Transform classes, the Auto-Augment classes and the functional methods.torchvision.datasets
, the calls are placed once on the constructor of VisionDataset so we don't need to add them individually on each dataset.torchvision.utils
, call should be added to the top of each public method.Event Format
Full qualified name of the component is logged as the event. For example:
torchvision.models.resnet.ResNet
Note: for events from C++ APIs, “.csrc” should be added after torchvision, for example:
torchvision.csrc.ops.nms.nms
Usage Log API
C10_LOG_API_USAGE_ONCE()
Above APIs are lightweight. By default, they are just no-op. It’s guaranteed that the same event is only recorded once within a process. Please note that 8 GPUs will still lead to 8 events, so the events should be dedup by a unique identifier like workflow job_id.
Implementation
Also considered
@log_api_usage
in use decorator to log API usage #4976__module__
:_log_api_usage(nms.__module__, "nms")
_log_api_usage(MODULE, "nms")
_log_api_usage_once(self)
__qualname__
for class, string for functionThe text was updated successfully, but these errors were encountered: