Skip to content

Commit 02cd3ce

Browse files
[3.13] gh-116510: Fix a Crash Due to Shared Immortal Interned Strings (gh-124865) (gh-125709) (GH-125204)
* gh-116510: Fix a Crash Due to Shared Immortal Interned Strings (gh-124865) Fix a crash caused by immortal interned strings being shared between sub-interpreters that use basic single-phase init. In that case, the string can be used by an interpreter that outlives the interpreter that created and interned it. For interpreters that share obmalloc state, also share the interned dict with the main interpreter. This is an un-revert of gh-124646 that then addresses the Py_TRACE_REFS failures identified by gh-124785. (cherry picked from commit f2cb399) Co-authored-by: Eric Snow <[email protected]> * [3.13] gh-125286: Share the Main Refchain With Legacy Interpreters (gh-125709) They used to be shared, before 3.12. Returning to sharing them resolves a failure on Py_TRACE_REFS builds. --------- Co-authored-by: Eric Snow <[email protected]>
1 parent 4f1440f commit 02cd3ce

File tree

8 files changed

+148
-23
lines changed

8 files changed

+148
-23
lines changed

Doc/library/sys.rst

+29
Original file line numberDiff line numberDiff line change
@@ -920,6 +920,35 @@ always available.
920920
It is not guaranteed to exist in all implementations of Python.
921921

922922

923+
.. function:: getobjects(limit[, type])
924+
925+
This function only exists if CPython was built using the
926+
specialized configure option :option:`--with-trace-refs`.
927+
It is intended only for debugging garbage-collection issues.
928+
929+
Return a list of up to *limit* dynamically allocated Python objects.
930+
If *type* is given, only objects of that exact type (not subtypes)
931+
are included.
932+
933+
Objects from the list are not safe to use.
934+
Specifically, the result will include objects from all interpreters that
935+
share their object allocator state (that is, ones created with
936+
:c:member:`PyInterpreterConfig.use_main_obmalloc` set to 1
937+
or using :c:func:`Py_NewInterpreter`, and the
938+
:ref:`main interpreter <sub-interpreter-support>`).
939+
Mixing objects from different interpreters may lead to crashes
940+
or other unexpected behavior.
941+
942+
.. impl-detail::
943+
944+
This function should be used for specialized purposes only.
945+
It is not guaranteed to exist in all implementations of Python.
946+
947+
.. versionchanged:: next
948+
949+
The result may include objects from other interpreters.
950+
951+
923952
.. function:: getprofile()
924953

925954
.. index::

Doc/using/configure.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -721,7 +721,7 @@ Debug options
721721
Effects:
722722

723723
* Define the ``Py_TRACE_REFS`` macro.
724-
* Add :func:`!sys.getobjects` function.
724+
* Add :func:`sys.getobjects` function.
725725
* Add :envvar:`PYTHONDUMPREFS` environment variable.
726726

727727
The :envvar:`PYTHONDUMPREFS` environment variable can be used to dump

Doc/whatsnew/3.13.rst

+11
Original file line numberDiff line numberDiff line change
@@ -2706,3 +2706,14 @@ Regression Test Changes
27062706
option. If used, it specifies a module that should be imported early
27072707
in the lifecycle of the interpreter, before ``site.py`` is executed.
27082708
(Contributed by Łukasz Langa in :gh:`110769`.)
2709+
2710+
2711+
Notable changes in 3.13.1
2712+
=========================
2713+
2714+
sys
2715+
---
2716+
2717+
* The previously undocumented special function :func:`sys.getobjects`,
2718+
which only exists in specialized builds of Python, may now return objects
2719+
from other interpreters than the one it's called in.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Fix a crash caused by immortal interned strings being shared between
2+
sub-interpreters that use basic single-phase init. In that case, the string
3+
can be used by an interpreter that outlives the interpreter that created and
4+
interned it. For interpreters that share obmalloc state, also share the
5+
interned dict with the main interpreter.

Objects/object.c

+44-12
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,48 @@ _PyDebug_PrintTotalRefs(void) {
169169
#define REFCHAIN(interp) interp->object_state.refchain
170170
#define REFCHAIN_VALUE ((void*)(uintptr_t)1)
171171

172+
static inline int
173+
has_own_refchain(PyInterpreterState *interp)
174+
{
175+
if (interp->feature_flags & Py_RTFLAGS_USE_MAIN_OBMALLOC) {
176+
return (_Py_IsMainInterpreter(interp)
177+
|| _PyInterpreterState_Main() == NULL);
178+
}
179+
return 1;
180+
}
181+
182+
static int
183+
refchain_init(PyInterpreterState *interp)
184+
{
185+
if (!has_own_refchain(interp)) {
186+
// Legacy subinterpreters share a refchain with the main interpreter.
187+
REFCHAIN(interp) = REFCHAIN(_PyInterpreterState_Main());
188+
return 0;
189+
}
190+
_Py_hashtable_allocator_t alloc = {
191+
// Don't use default PyMem_Malloc() and PyMem_Free() which
192+
// require the caller to hold the GIL.
193+
.malloc = PyMem_RawMalloc,
194+
.free = PyMem_RawFree,
195+
};
196+
REFCHAIN(interp) = _Py_hashtable_new_full(
197+
_Py_hashtable_hash_ptr, _Py_hashtable_compare_direct,
198+
NULL, NULL, &alloc);
199+
if (REFCHAIN(interp) == NULL) {
200+
return -1;
201+
}
202+
return 0;
203+
}
204+
205+
static void
206+
refchain_fini(PyInterpreterState *interp)
207+
{
208+
if (has_own_refchain(interp) && REFCHAIN(interp) != NULL) {
209+
_Py_hashtable_destroy(REFCHAIN(interp));
210+
}
211+
REFCHAIN(interp) = NULL;
212+
}
213+
172214
bool
173215
_PyRefchain_IsTraced(PyInterpreterState *interp, PyObject *obj)
174216
{
@@ -2171,16 +2213,7 @@ PyStatus
21712213
_PyObject_InitState(PyInterpreterState *interp)
21722214
{
21732215
#ifdef Py_TRACE_REFS
2174-
_Py_hashtable_allocator_t alloc = {
2175-
// Don't use default PyMem_Malloc() and PyMem_Free() which
2176-
// require the caller to hold the GIL.
2177-
.malloc = PyMem_RawMalloc,
2178-
.free = PyMem_RawFree,
2179-
};
2180-
REFCHAIN(interp) = _Py_hashtable_new_full(
2181-
_Py_hashtable_hash_ptr, _Py_hashtable_compare_direct,
2182-
NULL, NULL, &alloc);
2183-
if (REFCHAIN(interp) == NULL) {
2216+
if (refchain_init(interp) < 0) {
21842217
return _PyStatus_NO_MEMORY();
21852218
}
21862219
#endif
@@ -2191,8 +2224,7 @@ void
21912224
_PyObject_FiniState(PyInterpreterState *interp)
21922225
{
21932226
#ifdef Py_TRACE_REFS
2194-
_Py_hashtable_destroy(REFCHAIN(interp));
2195-
REFCHAIN(interp) = NULL;
2227+
refchain_fini(interp);
21962228
#endif
21972229
}
21982230

Objects/unicodeobject.c

+42-6
Original file line numberDiff line numberDiff line change
@@ -277,13 +277,37 @@ hashtable_unicode_compare(const void *key1, const void *key2)
277277
}
278278
}
279279

280+
/* Return true if this interpreter should share the main interpreter's
281+
intern_dict. That's important for interpreters which load basic
282+
single-phase init extension modules (m_size == -1). There could be interned
283+
immortal strings that are shared between interpreters, due to the
284+
PyDict_Update(mdict, m_copy) call in import_find_extension().
285+
286+
It's not safe to deallocate those strings until all interpreters that
287+
potentially use them are freed. By storing them in the main interpreter, we
288+
ensure they get freed after all other interpreters are freed.
289+
*/
290+
static bool
291+
has_shared_intern_dict(PyInterpreterState *interp)
292+
{
293+
PyInterpreterState *main_interp = _PyInterpreterState_Main();
294+
return interp != main_interp && interp->feature_flags & Py_RTFLAGS_USE_MAIN_OBMALLOC;
295+
}
296+
280297
static int
281298
init_interned_dict(PyInterpreterState *interp)
282299
{
283300
assert(get_interned_dict(interp) == NULL);
284-
PyObject *interned = interned = PyDict_New();
285-
if (interned == NULL) {
286-
return -1;
301+
PyObject *interned;
302+
if (has_shared_intern_dict(interp)) {
303+
interned = get_interned_dict(_PyInterpreterState_Main());
304+
Py_INCREF(interned);
305+
}
306+
else {
307+
interned = PyDict_New();
308+
if (interned == NULL) {
309+
return -1;
310+
}
287311
}
288312
_Py_INTERP_CACHED_OBJECT(interp, interned_strings) = interned;
289313
return 0;
@@ -294,7 +318,10 @@ clear_interned_dict(PyInterpreterState *interp)
294318
{
295319
PyObject *interned = get_interned_dict(interp);
296320
if (interned != NULL) {
297-
PyDict_Clear(interned);
321+
if (!has_shared_intern_dict(interp)) {
322+
// only clear if the dict belongs to this interpreter
323+
PyDict_Clear(interned);
324+
}
298325
Py_DECREF(interned);
299326
_Py_INTERP_CACHED_OBJECT(interp, interned_strings) = NULL;
300327
}
@@ -15306,6 +15333,13 @@ _PyUnicode_ClearInterned(PyInterpreterState *interp)
1530615333
}
1530715334
assert(PyDict_CheckExact(interned));
1530815335

15336+
if (has_shared_intern_dict(interp)) {
15337+
// the dict doesn't belong to this interpreter, skip the debug
15338+
// checks on it and just clear the pointer to it
15339+
clear_interned_dict(interp);
15340+
return;
15341+
}
15342+
1530915343
#ifdef INTERNED_STATS
1531015344
fprintf(stderr, "releasing %zd interned strings\n",
1531115345
PyDict_GET_SIZE(interned));
@@ -15827,8 +15861,10 @@ _PyUnicode_Fini(PyInterpreterState *interp)
1582715861
{
1582815862
struct _Py_unicode_state *state = &interp->unicode;
1582915863

15830-
// _PyUnicode_ClearInterned() must be called before _PyUnicode_Fini()
15831-
assert(get_interned_dict(interp) == NULL);
15864+
if (!has_shared_intern_dict(interp)) {
15865+
// _PyUnicode_ClearInterned() must be called before _PyUnicode_Fini()
15866+
assert(get_interned_dict(interp) == NULL);
15867+
}
1583215868

1583315869
_PyUnicode_FiniEncodings(&state->fs_codec);
1583415870

Python/pylifecycle.c

+14
Original file line numberDiff line numberDiff line change
@@ -670,6 +670,13 @@ pycore_create_interpreter(_PyRuntimeState *runtime,
670670
return status;
671671
}
672672

673+
// This could be done in init_interpreter() (in pystate.c) if it
674+
// didn't depend on interp->feature_flags being set already.
675+
status = _PyObject_InitState(interp);
676+
if (_PyStatus_EXCEPTION(status)) {
677+
return status;
678+
}
679+
673680
// initialize the interp->obmalloc state. This must be done after
674681
// the settings are loaded (so that feature_flags are set) but before
675682
// any calls are made to obmalloc functions.
@@ -2290,6 +2297,13 @@ new_interpreter(PyThreadState **tstate_p,
22902297
goto error;
22912298
}
22922299

2300+
// This could be done in init_interpreter() (in pystate.c) if it
2301+
// didn't depend on interp->feature_flags being set already.
2302+
status = _PyObject_InitState(interp);
2303+
if (_PyStatus_EXCEPTION(status)) {
2304+
return status;
2305+
}
2306+
22932307
// initialize the interp->obmalloc state. This must be done after
22942308
// the settings are loaded (so that feature_flags are set) but before
22952309
// any calls are made to obmalloc functions.

Python/pystate.c

+2-4
Original file line numberDiff line numberDiff line change
@@ -632,10 +632,8 @@ init_interpreter(PyInterpreterState *interp,
632632
assert(next != NULL || (interp == runtime->interpreters.main));
633633
interp->next = next;
634634

635-
PyStatus status = _PyObject_InitState(interp);
636-
if (_PyStatus_EXCEPTION(status)) {
637-
return status;
638-
}
635+
// We would call _PyObject_InitState() at this point
636+
// if interp->feature_flags were alredy set.
639637

640638
_PyEval_InitState(interp);
641639
_PyGC_InitState(&interp->gc);

0 commit comments

Comments
 (0)