Skip to content

Data race on block->next in mi_block_set_nextx #129748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
colesbury opened this issue Feb 6, 2025 · 11 comments
Open

Data race on block->next in mi_block_set_nextx #129748

colesbury opened this issue Feb 6, 2025 · 11 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@colesbury
Copy link
Contributor

colesbury commented Feb 6, 2025

Bug report

I've seen this in non-debug TSan builds. The TSAN report looks like:

  Write of size 8 at 0x7fffc4043690 by thread T2692:
    #0 mi_block_set_nextx /raid/sgross/cpython/./Include/internal/mimalloc/mimalloc/internal.h:652:15 (python+0x2ce71e) (BuildId: 2d15b5a5260b454c4f23bd5e53d32d43bfb806c4)
    #1 _mi_free_block_mt /raid/sgross/cpython/Objects/mimalloc/alloc.c:467:9 (python+0x2ce71e)
    #2 _mi_free_block /raid/sgross/cpython/Objects/mimalloc/alloc.c:506:5 (python+0x2a8b9a) (BuildId: 2d15b5a5260b454c4f23bd5e53d32d43bfb806c4)
    #3 _mi_free_generic /raid/sgross/cpython/Objects/mimalloc/alloc.c:524:3 (python+0x2a8b9a)
    #4 mi_free /raid/sgross/cpython/Objects/mimalloc/alloc.c (python+0x2c765b) (BuildId: 2d15b5a5260b454c4f23bd5e53d32d43bfb806c4)
    #5 _PyObject_MiFree /raid/sgross/cpython/Objects/obmalloc.c:284:5 (python+0x2c765b)
...

  Previous atomic read of size 8 at 0x7fffc4043690 by thread T2690:
    #0 _Py_atomic_load_uintptr_relaxed /raid/sgross/cpython/./Include/cpython/pyatomic_gcc.h:375:10 (python+0x4d0341) (BuildId: 2d15b5a5260b454c4f23bd5e53d32d43bfb806c4)
    #1 _Py_IsOwnedByCurrentThread /raid/sgross/cpython/./Include/object.h:252:12 (python+0x4d0341)
    #2 _Py_TryIncrefFast /raid/sgross/cpython/./Include/internal/pycore_object.h:560:9 (python+0x4d0341)
    #3 _Py_TryIncrefCompare /raid/sgross/cpython/./Include/internal/pycore_object.h:599:9 (python+0x4d0341)
    #4 PyMember_GetOne /raid/sgross/cpython/Python/structmember.c:99:18 (python+0x4d0054) (BuildId: 2d15b5a5260b454c4f23bd5e53d32d43bfb806c4)
    #5 member_get /raid/sgross/cpython/Objects/descrobject.c:179:12 (python+0x2056aa) (BuildId: 2d15b5a5260b454c4f23bd5e53d32d43bfb806c4)
...

SUMMARY: ThreadSanitizer: data race /raid/sgross/cpython/./Include/internal/mimalloc/mimalloc/internal.h:652:15 in mi_block_set_nextx

This happens when we call _Py_TryIncrefCompare() or _Py_TryXGetRef or similar on an object that may be concurrently freed. Perhaps surprisingly, this is a supported operation. See https://peps.python.org/pep-0703/#mimalloc-changes-for-optimistic-list-and-dict-access.

The problem is mi_block_set_nextx doesn't use a relaxed store, so this is a data race because the mimalloc freelist pointer may overlap the ob_tid field. The mimalloc freelist pointer is at the beginning of the freed memory block and ob_tid is the first field in PyObject.

You won't see this data race if:

  • The object uses Py_TPFLAGS_MANAGED_DICT. In that case the beginning the managed dictionary pointer comes before ob_tid. That is fine because, unlike ob_tid, the managed dictionary pointer is never accessed concurrently with freeing the object.
  • If CPython is built with --with-pydebug. The debug allocator sticks two extra words at the beginning of each allocation, so the freelist pointers will overlap with those (this is also fine).

Here are two options:

  • Use relaxed stores in mimalloc, such as in mi_block_set_nextx. There's about six of these assignments -- not terrible to change -- but I don't love the idea of modifications to mimalloc that don't make sense to upstream, and these only make sense in the context of free threaded CPython.
  • Reorder PyObject in the free threading build so that ob_type is the first field. This avoids any overlap with ob_tid. It's annoying to break ABI or change the PyObject header though.

cc @mpage @Yhg1s

@colesbury colesbury added topic-free-threading type-bug An unexpected behavior, bug, or error labels Feb 6, 2025
@hawkinsp
Copy link
Contributor

hawkinsp commented Feb 6, 2025

I think we saw this one too: https://gist.github.com/hawkinsp/948bc90fe8942f69db78924ffbb8a4eb but never figured out how to trigger it.

@Yhg1s
Copy link
Member

Yhg1s commented Feb 6, 2025

Couldn't we also just add an offsetting dummy item to the mi_block_t type, before the mi_encoded_t member? We have to avoid clashing with other offsets we read from the object struct in different build modes, but we have to be careful of that if we were to move the ob_tid field as well, and padding the mi_block_t struct means only one change to mimalloc. (That doesn't solve the other objection to modifying mimalloc this way, of course.)

@Yhg1s
Copy link
Member

Yhg1s commented Feb 7, 2025

(Oh, unless blocks in mimalloc are allocation-sized, in which case that would break on pointer-sized allocations.)

@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Feb 7, 2025
@Yhg1s
Copy link
Member

Yhg1s commented Feb 7, 2025

Thinking about this a bit more: besides the (correct) TSan warning, realistically the problem here is that the read can lead to garbage being read as the ob_tid field. We're only comparing it against the current thread, so it feels very unlikely that the read will lead to a valid ob_tid, let alone the current thread's ob_tid. Moving around mimalloc's freelist pointer runs the same risk for other fields (e.g. we're loading ob_ref_local as part of the same lookup). Is it really worth changing the ABI for this?

@colesbury
Copy link
Contributor Author

It's not garbage in the ob_tid field in the sense that it's not uninitialized memory or a completely arbitrary value. The freelist pointers are distinct from other pointers, including ob_tid. We rely on the same principle in the free threaded GC when we reuse ob_tid for worklists.

ob_type is the only other field in PyObject that would be safe to overlap with the freelist pointer. That's because we don't read ob_type during _Py_TryIncrefCompare() and other similar functions.

Moving around mimalloc's freelist pointer runs the same risk for other fields...

Yeah, I don't think that moving the mimalloc freelist pointer will work well for multiple reasons. Some allocations may be pointer-sized, like you said. Pre-headers (managed dictionary/wekarefs) also make this more complicated.

Reordering the fields in PyObject so that ob_type is first is much simpler.

Is it really worth changing the ABI for this?

I'm not sure. ABI breaking changes during alphas seems pretty normal. But using relaxed atomic writes in mimalloc isn't so bad either.

@Yhg1s
Copy link
Member

Yhg1s commented Feb 7, 2025

It's not garbage in the ob_tid field in the sense that it's not uninitialized memory or a completely arbitrary value.

I meant that in the current situation, with the non-atomic writes, the atomic reads can produce a value that's different from both the old value and the pointer written to it by mimalloc, isn't that true? (I forget if it's technically undefined or not.)

@duaneg
Copy link
Contributor

duaneg commented Mar 3, 2025

If it is useful, this reliably reproduces this on my machine: repro test case

@corona10
Copy link
Member

@colesbury @Yhg1s Do you have a decision for this issue? IMO, it should be decided before LS as much as possible. If it is delayed, I think that we should consider modifying the mimalloc implementation.

@corona10
Copy link
Member

cc @hugovk

@colesbury
Copy link
Contributor Author

I'm still not sure what to do. Modifying the mimalloc implementation is probably the easiest.

@hawkinsp
Copy link
Contributor

The LRU cache test added in #133787 seems to trigger this race with a high probability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants