-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Cache return types of called functions #390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The alternative to how this commit addresses the issue (storing a pointer per function call) would be to do that per type via a static member variable of a template class. Any thoughts on that approach? |
The current inheritance testing isn't sufficient to detect a cache failure; the test added here breaks PR pybind#390, which caches the run-time-determined return type the first time a function is called, then reuses that cached type even though the run-time type could be different for a future call.
That won't work, and actually, neither will this PR as currently written. The issue is that the actual return type is not always the same for a given function; currently, because we're using typeid on the pointer to be cast, we're actually do a run-time polymorphic type lookup when calling m.def("return_class_1", []() -> BaseClass* { return new DerivedClass1(); });
m.def("return_class_2", []() -> BaseClass* { return new DerivedClass2(); }); we pick those up correctly as new instances of DerivedClass1/DerivedClass2, not BaseClass*, but at compile time there is no difference between those functions. This PR is broken because it caches per-function, which means the problem doesn't show up in the above, but would show up in: m.def("return_class_n", [](bool one) -> BaseClass* { if (one) return new DerivedClass1(); else return new DerivedClass2(); }); i.e. if a function returns different types, we fail to notice (because the cache will contain whatever the first run-time type was). This is obviously a testing defect that I've submitted PR #409 to detect (separate from this PR because it's a useful test whether or not this PR proceeds). |
I think, however, that there's an easy enough solution for this PR: instead of just using the cache when we find it, we can use the cached value if we find it and its type agrees with the current value's In practice, this should work almost as well when something is called in a tight loop. It would take a function that continually oscillates in the actual type returned to defeat the cache, so that, for example, the following would never benefit from the cache: static bool counter = 1;
m.def("f1", [&counter]() -> BaseClass* { if (counter++ % 2 == 0) return new Derived1(); else return new Derived2(); }); Changing This approach can be combined with the templated-static variable approach; it just reduces the granularity of the cache to any functions with the same return type, which for most cases is probably fine. It means that the following code will additionally be cache-defeating: m.def("f1", [&counter]() -> BaseClass* { return new Derived1(); });
m.def("f2", [&counter]() -> BaseClass* { return new Derived2(); }); Combined with alternating calls from Python to |
Each time a function is called, we currently do a type lookup in registered_types_cpp to figure out the python type info associated with the return value. This lookup isn't free, particularly for often-called functions, but will usually yield an identical result each time the function is called. This commit adds a per-return-type cache into the return type lookup for generic types (i.e. those inheriting from type_caster_generic) to avoid the hash lookup. The cache is invalidated whenever the runtime type changes (to avoid the problem PR pybind#409 tests for).
33cdacc
to
89565a7
Compare
Updated to use local static, return-type-level caching. The .so premium on tests with this version is down to 0.4% (g++ 6). The overhead reduction looks like this (using the
|
Can you explain the rows in
i.e. what are For this PR, I think I'd like to wait to see how it plays out in conjunction with the other planned optimizations before merging as the performance improvements are fairly small and probably just noticeable in very simple getter-style functions. |
Indeed, the gains are not likely to be noticeable outside tight loop code with simple returns. That said, they are still there: but a 20ns gain per function call isn't likely to be significant when not invoking functions millions of times. The code actually invoked is the latest version of
|
Shall I close this for now? It probably belongs as part of the other iterator optimization changes in #376. |
Ok, let's do that. @aldanor can still get the diffs from here when he has some time to work on the ticket. |
Each time a function is called, we currently do a type lookup in registered_types_cpp to figure out the python type info associated with the return value. This hash lookup adds to the overhead, which can be a small but noticeable overhead for functions called in a tight python loop (one of the two main issues reported in #376). It can, however, be easily cached since the lookup is going to return an identical value each time (since pybind11 doesn't support de-registering types).
This commit adds an (optional) cache parameter to the lookup for generic types (i.e. those inheriting from type_caster_generic) so that these lookups can be cached.
Of course, adding a cache variable isn't free either: there is a small (0.93%) increase in the test .so size (on linux/g++6), but it seems worthwhile for a noticeable overhead reduction.