-
Notifications
You must be signed in to change notification settings - Fork 259
Enable free-threading support #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The significant change here is the use of thread local instead of a volatile global for the switching_thread_state global (which is otherwise protected by the GIL). There's some overhead to using a thread local, so only do this in the free-threaded build. The only other two bits of shared mutable data are `G_TOTAL_MAIN_GREENLETS and ThreadState::clocks_used_during_gc. Modify the latter to use a std::atomic with relaxed memory order, which should be good enough, and performance probably matters for those updates. For G_MAIN_TOTAL_GREENLETS, switch to a std::atomic without changing the inc/dec operations (which means they use sequential consistency), because they're rare enough that performance doesn't really matter. Also mark the main extension modules and the two test extensions as supporting free-threading (without switching to multi-phase init). The GIL will still temporarily be enabled during module import, but that probably won't matter (modules are usually imported before starting threads). If it does, switching to multi-phase init is always an option. The existing test suite cover threads extensively enough that no extra tests are necessary. There is an intermittent failure (<0.2% of runs) that shows up when running the testsuite in a tight loop, but this happens in regular Python builds (and before 3.14) too. ThreadSanitizer can't be used on greenlet, from what I can tell because of how it gets confused by the stack switching. This is the case for GILful Python builds as well.
|
Thanks, this is a great start! I've picked it up to take it across the finish line. There are still some things that clearly need done (e.g., actually enabling the allocator the correct way) and, unfortunately, some hard interpreter crashes to finish debugging. |
|
Feel free to let me know if you want any help debugging the crashes. I've seen one on CI that I haven't been able to locally reproduce, but I do have some experience with both free-threading and the CPython internals. |
|
The fun one right now is a crash clearing module globals on shutdown because an object ( A more complete backtrace with debugging symbols (cpython commit af586d8d2601b5fe52277ba7bf5d9e1ff93ffbb6, built with assertions enabled): Usually something like this means I'm not switching something correctly but I just started debugging and haven't found it yet. |
|
Running only this doctest from sphinx causes the crash: ==================================
Garbage Collection and greenlets
==================================
.. doctest::
>>> from greenlet import getcurrent, greenlet, GreenletExit
.. doctest::
>>> import gc
>>> glet = greenlet(gc.collect)
>>> _ = glet.switch()
Still debugging...I've tried various things with the |
|
The interpreter is much more complex than it used to be, even from 3.13, and I don't yet fully understand all the new interactions. The issue is definitely some difference between the generic No threads are involved, so I believe we can take any of the cross-thread reference counting stuff off the table. I can edit Both bytecodes make use of I've about exhausted the amount of time I have to spend on this, and I'm kind of stumped as to where to look next. Since I do have a workaround, I may go ahead and merge the changes (these plus my other fixes from branch |
Add support for free-threaded Python (PEP 703).
The significant change here is the use of thread local instead of a volatile global for the switching_thread_state global (which is otherwise protected by the GIL). There's some overhead to using a thread local, so only do this in the free-threaded build.
The only other two bits of shared mutable data are G_TOTAL_MAIN_GREENLETS and ThreadState::clocks_used_during_gc. Modify the latter to use a std::atomic with relaxed memory order, which should be good enough, and performance probably matters for those updates.
For G_TOTAL_MAIN_GREENLETS, switch to a std::atomic without changing the inc/dec operations (which means they use sequential consistency), because they're rare enough that performance doesn't really matter.
Also mark the main extension modules and the two test extensions as supporting free-threading (without switching to multi-phase init). The GIL will still temporarily be enabled during module import, but that probably won't matter (modules are usually imported before starting threads). If it does, switching to multi-phase init is always an option.
The existing test suite cover threads extensively enough that no extra tests are necessary. There is an intermittent failure (<0.2% of runs) that shows up when running the testsuite in a tight loop, but this happens in regular Python builds (and before 3.14) too. ThreadSanitizer can't be used on greenlet, from what I can tell because of how it gets confused by the stack switching. This is the case for GILful Python builds as well.