Skip to content

Race between grow_thread_array and _Py_qsbr_reserve under free threading #129732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hawkinsp opened this issue Feb 6, 2025 · 0 comments · Fixed by #129738
Closed

Race between grow_thread_array and _Py_qsbr_reserve under free threading #129732

hawkinsp opened this issue Feb 6, 2025 · 0 comments · Fixed by #129738
Labels
topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@hawkinsp
Copy link
Contributor

hawkinsp commented Feb 6, 2025

Bug report

Bug description:

I don't have a succinct reproducer for this bug yet, but I saw the following race in JAX CI:

jax-ml/jax#26359

WARNING: ThreadSanitizer: data race (pid=208275)
  Write of size 8 at 0x555555d43b60 by thread T121:
    #0 grow_thread_array /__w/jax/jax/cpython/Python/qsbr.c:101:19 (python3.13+0x4a3905) (BuildId: 8f8869b5f3143bd14dda26aa2bf37336b4902370)
    #1 _Py_qsbr_reserve /__w/jax/jax/cpython/Python/qsbr.c:203:13 (python3.13+0x4a3905)
    #2 new_threadstate /__w/jax/jax/cpython/Python/pystate.c:1569:27 (python3.13+0x497df2) (BuildId: 8f8869b5f3143bd14dda26aa2bf37336b4902370)
    #3 PyGILState_Ensure /__w/jax/jax/cpython/Python/pystate.c:2766:16 (python3.13+0x49af78) (BuildId: 8f8869b5f3143bd14dda26aa2bf37336b4902370)
    #4 nanobind::gil_scoped_acquire::gil_scoped_acquire() /proc/self/cwd/external/nanobind/include/nanobind/nb_misc.h:15:43 (xla_extension.so+0xa4fe551) (BuildId: 32eac14928efa68545d22a6013f16aa63a686fef)
    #5 xla::CpuCallback::PrepareAndCall(void*, void**) /proc/self/cwd/external/xla/xla/python/callback.cc:67:26 (xla_extension.so+0xa4fe551)
    #6 xla::XlaPythonCpuCallback(void*, void**, XlaCustomCallStatus_*) /proc/self/cwd/external/xla/xla/python/callback.cc:177:22 (xla_extension.so+0xa500c9a) (BuildId: 32eac14928efa68545d22a6013f16aa63a686fef)
...

  Previous read of size 8 at 0x555555d43b60 by thread T124:
    #0 _Py_qsbr_reserve /__w/jax/jax/cpython/Python/qsbr.c:216:47 (python3.13+0x4a3ad7) (BuildId: 8f8869b5f3143bd14dda26aa2bf37336b4902370)
    #1 new_threadstate /__w/jax/jax/cpython/Python/pystate.c:1569:27 (python3.13+0x497df2) (BuildId: 8f8869b5f3143bd14dda26aa2bf37336b4902370)
    #2 PyGILState_Ensure /__w/jax/jax/cpython/Python/pystate.c:2766:16 (python3.13+0x49af78) (BuildId: 8f8869b5f3143bd14dda26aa2bf37336b4902370)
    #3 nanobind::gil_scoped_acquire::gil_scoped_acquire() /proc/self/cwd/external/nanobind/include/nanobind/nb_misc.h:15:43 (xla_extension.so+0xa4fe551) (BuildId: 32eac14928efa68545d22a6013f16aa63a686fef)
    #4 xla::CpuCallback::PrepareAndCall(void*, void**) /proc/self/cwd/external/xla/xla/python/callback.cc:67:26 (xla_extension.so+0xa4fe551)
    #5 xla::XlaPythonCpuCallback(void*, void**, XlaCustomCallStatus_*) /proc/self/cwd/external/xla/xla/python/callback.cc:177:22 (xla_extension.so+0xa500c9a) (BuildId: 32eac14928efa68545d22a6013f16aa63a686fef)
...

I think what's happening here is that two threads that were not created by Python are calling PyGILState_Ensure concurrently, so they can call into CPython APIs.

This appears to be an unlocked access on shared->array and it would probably be sufficient to move that read under the mutex in _Py_qsbr_reserve.

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Linked PRs

@hawkinsp hawkinsp added the type-bug An unexpected behavior, bug, or error label Feb 6, 2025
hawkinsp added a commit to hawkinsp/cpython that referenced this issue Feb 6, 2025
The read of shared->array should happen under the lock.

Fixes python#129732
hawkinsp added a commit to hawkinsp/cpython that referenced this issue Feb 6, 2025
…-threading.

The read of shared->array should happen under the lock to avoid a race.

Fixes python#129732
hawkinsp added a commit to hawkinsp/cpython that referenced this issue Feb 6, 2025
…-threading.

The read of shared->array should happen under the lock to avoid a race.

Fixes python#129732
hawkinsp added a commit to hawkinsp/cpython that referenced this issue Feb 6, 2025
…-threading.

The read of shared->array should happen under the lock to avoid a race.

Fixes python#129732
colesbury pushed a commit that referenced this issue Feb 6, 2025
…ing (gh-129738)

The read of `shared->array` should happen under the lock to avoid a race.
colesbury pushed a commit to colesbury/cpython that referenced this issue Feb 6, 2025
…r free-threading (pythongh-129738)

The read of `shared->array` should happen under the lock to avoid a race.
(cherry picked from commit b4ff8b2)

Co-authored-by: Peter Hawkins <[email protected]>
colesbury added a commit that referenced this issue Feb 6, 2025
…-threading (gh-129738) (gh-129747)

The read of `shared->array` should happen under the lock to avoid a race.
(cherry picked from commit b4ff8b2)

Co-authored-by: Peter Hawkins <[email protected]>
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this issue Feb 7, 2025
…threading (pythongh-129738)

The read of `shared->array` should happen under the lock to avoid a race.
cmaloney pushed a commit to cmaloney/cpython that referenced this issue Feb 8, 2025
…threading (pythongh-129738)

The read of `shared->array` should happen under the lock to avoid a race.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants