-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Data race on block->next
in mi_block_set_nextx
#129748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think we saw this one too: https://gist.github.com/hawkinsp/948bc90fe8942f69db78924ffbb8a4eb but never figured out how to trigger it. |
Couldn't we also just add an offsetting dummy item to the mi_block_t type, before the mi_encoded_t member? We have to avoid clashing with other offsets we read from the object struct in different build modes, but we have to be careful of that if we were to move the ob_tid field as well, and padding the mi_block_t struct means only one change to mimalloc. (That doesn't solve the other objection to modifying mimalloc this way, of course.) |
(Oh, unless blocks in mimalloc are allocation-sized, in which case that would break on pointer-sized allocations.) |
Thinking about this a bit more: besides the (correct) TSan warning, realistically the problem here is that the read can lead to garbage being read as the ob_tid field. We're only comparing it against the current thread, so it feels very unlikely that the read will lead to a valid ob_tid, let alone the current thread's ob_tid. Moving around mimalloc's freelist pointer runs the same risk for other fields (e.g. we're loading ob_ref_local as part of the same lookup). Is it really worth changing the ABI for this? |
It's not garbage in the
Yeah, I don't think that moving the mimalloc freelist pointer will work well for multiple reasons. Some allocations may be pointer-sized, like you said. Pre-headers (managed dictionary/wekarefs) also make this more complicated. Reordering the fields in
I'm not sure. ABI breaking changes during alphas seems pretty normal. But using relaxed atomic writes in mimalloc isn't so bad either. |
I meant that in the current situation, with the non-atomic writes, the atomic reads can produce a value that's different from both the old value and the pointer written to it by mimalloc, isn't that true? (I forget if it's technically undefined or not.) |
If it is useful, this reliably reproduces this on my machine: repro test case |
@colesbury @Yhg1s Do you have a decision for this issue? IMO, it should be decided before LS as much as possible. If it is delayed, I think that we should consider modifying the mimalloc implementation. |
cc @hugovk |
I'm still not sure what to do. Modifying the mimalloc implementation is probably the easiest. |
The LRU cache test added in #133787 seems to trigger this race with a high probability. |
Bug report
I've seen this in non-debug TSan builds. The TSAN report looks like:
This happens when we call
_Py_TryIncrefCompare()
or_Py_TryXGetRef
or similar on an object that may be concurrently freed. Perhaps surprisingly, this is a supported operation. See https://peps.python.org/pep-0703/#mimalloc-changes-for-optimistic-list-and-dict-access.The problem is
mi_block_set_nextx
doesn't use a relaxed store, so this is a data race because the mimalloc freelist pointer may overlap theob_tid
field. The mimalloc freelist pointer is at the beginning of the freed memory block andob_tid
is the first field inPyObject
.You won't see this data race if:
Py_TPFLAGS_MANAGED_DICT
. In that case the beginning the managed dictionary pointer comes beforeob_tid
. That is fine because, unlikeob_tid
, the managed dictionary pointer is never accessed concurrently with freeing the object.--with-pydebug
. The debug allocator sticks two extra words at the beginning of each allocation, so the freelist pointers will overlap with those (this is also fine).Here are two options:
mi_block_set_nextx
. There's about six of these assignments -- not terrible to change -- but I don't love the idea of modifications to mimalloc that don't make sense to upstream, and these only make sense in the context of free threaded CPython.PyObject
in the free threading build so thatob_type
is the first field. This avoids any overlap withob_tid
. It's annoying to break ABI or change the PyObject header though.cc @mpage @Yhg1s
The text was updated successfully, but these errors were encountered: