-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-133395: add option for extension modules to specialize BINARY_OP/SUBSCR, apply to arrays #133396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the _PyBinopSpecializationDescr
supposed to be extensible?
If so we'll need a means for extensions to free the structs they have allocated. Or is the intention that they are statically allocated?
If they aren't extensible, maybe the VM should allocate them and clean them up?
I assumed it's extensible and the extension manages it. The alternative is that it has a |
Are there any microbenchmarks that show this improves performance? |
Co-authored-by: Bénédikt Tran <[email protected]>
Misc/NEWS.d/next/Core_and_Builtins/2025-05-04-19-32-47.gh-issue-133395.VhWWEP.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: Michael Droettboom <[email protected]>
} | ||
break; | ||
} | ||
/* _GUARD_BINARY_OP_EXTEND is not a viable micro-op for tier 2 because it uses the 'this_instr' variable */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this change made it so that any traces containing BINARY_OP_EXTEND
can't be JIT-compiled. Is there a way to rewrite this to be tier-2 friendly? Mutating inline caches doesn't work there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't have to NULL the cache, just change the opcode to BINARY_OP and hope nobody tries to use the pointer in the cache anymore. Will make it harder to debug if they do.
Also, based on the benchmarking results that were posted, it looks like the |
This seems to cause the refleaks unit test to fail for the test_datetime module. The leak seems to be related to the |
The crash should have been fixed. I'll check the leak. Re tier 2 - maybe there's a way but we can just deopt them when jitting if not. |
The follow code is enough to reprodue the issue: import array
import gc
def func():
a=array.array('B', b'hello world')
# call array_binop_specialize() 3 times, once per loop
print("specialize")
for _ in range(10): a[1]
print("specialize")
for _ in range(10): a[1]
print("specialize")
for _ in range(10): a[1]
func()
# array_subscr_free() is not called
del func
gc.collect() The specialization calls PyMem_Malloc() in array_binop_specialize(), but nothing calls PyMem_Free(): array_subscr_free() is not called. |
It seems that |
I don't think that's the issue. Free needs to happen when the code object is freed, and currently it's not. |
…ze BINARY_OP/SUBSCR, apply to arrays (python#133396)" This reverts commit 082dbf7.
I'm going to revert this so it doesn't delay the release today. It's not terribly important for this to be in 3.14. |
@iritkatriel I just ran JIT benchmarks on the head of this branch. Those three benchmarks are still broken, and it makes the JIT 1% slower (with individual benchmarks slowing down up to 25%, one benchmark using 8% more memory): So we should make sure to benchmark the JIT when this lands again, or find a new approach (I don't love having strong references in the inline caches, even just pointers to owned memory). |
Benchmark results: https://github.com/faster-cpython/benchmarking-public/blob/main/results/bm-20250502-3.14.0a7%2B-ec344eb/bm-20250502-azure-x86_64-iritkatriel-array_subscr-3.14.0a7%2B-ec344eb-pystats-vs-base.md
Specialization stats reflect the change, no overall improvement in perf.