From ccea6000526e8a75aaec5ab1ab278b4aa8d01723 Mon Sep 17 00:00:00 2001 From: Neil Schemenauer Date: Wed, 11 Jun 2025 15:52:54 -0700 Subject: [PATCH 1/8] Add InternalDocs/qsbr.md. --- InternalDocs/README.md | 2 + InternalDocs/qsbr.md | 129 +++++++++++++++++++++++++++++++++++++++++ Python/qsbr.c | 2 +- 3 files changed, 132 insertions(+), 1 deletion(-) create mode 100644 InternalDocs/qsbr.md diff --git a/InternalDocs/README.md b/InternalDocs/README.md index 4502902307cd5c..5131ba6f43bd2e 100644 --- a/InternalDocs/README.md +++ b/InternalDocs/README.md @@ -41,3 +41,5 @@ Program Execution - [Garbage Collector Design](garbage_collector.md) - [Exception Handling](exception_handling.md) + +- [Quiescent-State Based Reclamation (QSBR)](qsbr.md) diff --git a/InternalDocs/qsbr.md b/InternalDocs/qsbr.md new file mode 100644 index 00000000000000..8c1d424918ca46 --- /dev/null +++ b/InternalDocs/qsbr.md @@ -0,0 +1,129 @@ +# Quiescent-State Based Reclamation (QSBR) + +## Introduction + +When implementing lock-free data structures, a key challenge is determining +when it is safe to free memory that has been logically removed from a +structure. Freeing memory too early can lead to use-after-free bugs if another +thread is still accessing it. Freeing it too late results in excessive memory +consumption. + +Safe memory reclamation (SMR) schemes address this by delaying the free +operation until all concurrent read accesses are guaranteed to have completed. +Quiescent-State Based Reclamation (QSBR) is an SMR scheme used in Python's +free-threaded build to manage the lifecycle of shared memory. + +QSBR requires threads to periodically report that they are in a quiescent +state. A thread is in a quiescent state if it holds no references to shared +objects that might be reclaimed. Think of it as a checkpoint where a thread +signals, "I am not in the middle of any operation that relies on a shared +resource." In Python, the eval_breaker provides a natural and convenient place +for threads to report this state. + + +## Use in Free-Threaded Python + +While CPython's memory management is dominated by reference counting and a +tracing garbage collector, these mechanisms are not suitable for all data +structures. For example, the backing array of a list object is not individually +reference-counted but may have a shorter lifetime than the PyListObject that +contains it. We could delay reclamation until the next GC run, but we want +reclamation to be prompt and to run the GC less frequently in the free-threaded +build, as it requires pausing all threads. + +Many operations in the free-threaded build are protected by locks. However, for +performance-critical code, we want to allow reads to happen concurrently with +updates. For instance, we want to avoid locking during most list read accesses. +If a list is resized while another thread is reading it, QSBR provides the +mechanism to determine when it is safe to free the list's old backing array. + +Specific use cases for QSBR include: + +* Dictionary keys (PyDictKeysObject) and list arrays (ob_item): When a +dictionary or list that may be shared between threads is resized, we use QSBR +to delay freeing the old keys or array until it's safe. For dicts and lists +that are not shared, their storage can be freed immediately upon resize. + +* Mimalloc mi_page_t: Non-locking dictionary and list accesses require +cooperation from the memory allocator. If an object is freed and its memory is +reused, we must ensure the new object's reference count field is at the same +memory location. In practice, this means when a mimalloc page (mi_page_t) +becomes empty, we don't immediately allow it to be reused for allocations of a +different size class. QSBR is used to determine when it's safe to repurpose the +page or return its memory to the OS. + + +## Implementation Details + + +### Core Implementation + +The proposal to add QSBR to Python is contained in Github issue 115103 [1]. +Many details of that proposal have been copied here, so they can be kept +up-to-date with the actual implementation. + +Python's QSBR implementation is based on FreeBSD's "Global Unbounded +Sequences." [2, 3, 4]. It relies on a few key counters: + +* Global Write Sequence (`wr_seq`): A per-interpreter counter, `wr_seq`, is started +at 1 and incremented by 2 each time it is advanced. This ensures its value is +always odd, which can be used to distinguish it from other state values. When +an object needs to be reclaimed, `wr_seq` is advanced, and the object is tagged +with this new sequence number. + +* Per-Thread Read Sequence: Each thread has a local read sequence counter. When +a thread reaches a quiescent state (e.g., at the eval_breaker), it copies the +current global `wr_seq` to its local counter. + +* Global Read Sequence (`rd_seq`): This per-interpreter value stores the minimum +of all per-thread read sequence counters (excluding detached threads). It is +updated by a "polling" operation. + +To free an object, the following steps are taken: + +1. Advance the global `wr_seq`. + +2. Add the object's pointer to a deferred-free list, tagging it with the new + `wr_seq` value as its qsbr_goal. + +Periodically, a polling mechanism processes this deferred-free list: + +1. The minimum read sequence value across all active threads is calculated and + stored as the global `rd_seq`. + +2. For each item on the deferred-free list, if its qsbr_goal is less than the + new `rd_seq`, its memory is freed, and it is removed from the list. Otherwise, + it remains on the list for a future attempt. + + +### Deferred Advance Optimization + +To reduce memory contention from frequent updates to the global `wr_seq`, its +advancement is sometimes deferred. Instead of incrementing `wr_seq` on every +reclamation request, each thread tracks its number of deferrals locally. Once +the deferral count reaches a limit (QSBR_DEFERRED_LIMIT, currently 10), the +thread advances the global `wr_seq` and resets its local count. + +When an object is added to the deferred-free list, its qsbr_goal is set to +`wr_seq` + 2. By setting the goal to the next sequence value, we ensure it's safe +to defer the global counter advancement. This optimization improves runtime +speed but may increase peak memory usage by slightly delaying when memory can +be reclaimed. + + +## Limitations + +Determining the `rd_seq` requires scanning over all thread states. This operation +could become a bottleneck in applications with a very large number of threads +(e.g., >1,000). Future work may address this with more advanced mechanisms, +such as a tree-based structure or incremental scanning. For now, the +implementation prioritizes simplicity, with plans for refinement if +multi-threaded benchmarks reveal performance issues. + + +## References + +1. https://github.com/python/cpython/issues/115103 +2. https://youtu.be/ZXUIFj4nRjk?t=694 +3. https://people.kernel.org/joelfernandes/gus-vs-rcu +4. http://bxr.su/FreeBSD/sys/kern/subr_smr.c#44 diff --git a/Python/qsbr.c b/Python/qsbr.c index bf34fb2523dfc8..afa03776c26b6f 100644 --- a/Python/qsbr.c +++ b/Python/qsbr.c @@ -1,6 +1,6 @@ /* * Implementation of safe memory reclamation scheme using - * quiescent states. + * quiescent states. See InternalDocs/qsbr.md. * * This is derived from the "GUS" safe memory reclamation technique * in FreeBSD written by Jeffrey Roberson. It is heavily modified. Any bugs From def94947a65f52e7d0ff448e524d69a0955840a9 Mon Sep 17 00:00:00 2001 From: Neil Schemenauer Date: Fri, 13 Jun 2025 10:04:50 -0700 Subject: [PATCH 2/8] Update InternalDocs/qsbr.md Co-authored-by: Kumar Aditya --- InternalDocs/qsbr.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/InternalDocs/qsbr.md b/InternalDocs/qsbr.md index 8c1d424918ca46..e8de7b51528f67 100644 --- a/InternalDocs/qsbr.md +++ b/InternalDocs/qsbr.md @@ -39,7 +39,7 @@ mechanism to determine when it is safe to free the list's old backing array. Specific use cases for QSBR include: -* Dictionary keys (PyDictKeysObject) and list arrays (ob_item): When a +* Dictionary keys (PyDictKeysObject) and list arrays (_PyListArray): When a dictionary or list that may be shared between threads is resized, we use QSBR to delay freeing the old keys or array until it's safe. For dicts and lists that are not shared, their storage can be freed immediately upon resize. From b1926091a1d9ba0505a6df79c48bcd774e082c28 Mon Sep 17 00:00:00 2001 From: Neil Schemenauer Date: Fri, 13 Jun 2025 10:05:00 -0700 Subject: [PATCH 3/8] Update InternalDocs/qsbr.md Co-authored-by: Pieter Eendebak --- InternalDocs/qsbr.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/InternalDocs/qsbr.md b/InternalDocs/qsbr.md index e8de7b51528f67..e62a98aca2c354 100644 --- a/InternalDocs/qsbr.md +++ b/InternalDocs/qsbr.md @@ -10,7 +10,7 @@ consumption. Safe memory reclamation (SMR) schemes address this by delaying the free operation until all concurrent read accesses are guaranteed to have completed. -Quiescent-State Based Reclamation (QSBR) is an SMR scheme used in Python's +Quiescent-State Based Reclamation (QSBR) is a SMR scheme used in Python's free-threaded build to manage the lifecycle of shared memory. QSBR requires threads to periodically report that they are in a quiescent From 19c0efa4a0e0a631ed10954729eb512ed1394810 Mon Sep 17 00:00:00 2001 From: Neil Schemenauer Date: Fri, 20 Jun 2025 12:11:22 -0700 Subject: [PATCH 4/8] Update InternalDocs/qsbr.md Co-authored-by: Kumar Aditya --- InternalDocs/qsbr.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/InternalDocs/qsbr.md b/InternalDocs/qsbr.md index e62a98aca2c354..c401c0f6fb3e02 100644 --- a/InternalDocs/qsbr.md +++ b/InternalDocs/qsbr.md @@ -58,7 +58,7 @@ page or return its memory to the OS. ### Core Implementation -The proposal to add QSBR to Python is contained in Github issue 115103 [1]. +The proposal to add QSBR to Python is contained in Github issue 115103 [^1]. Many details of that proposal have been copied here, so they can be kept up-to-date with the actual implementation. From c80ac373d5b6c2688f2b7af786eac88bf8c15ca2 Mon Sep 17 00:00:00 2001 From: Neil Schemenauer Date: Fri, 20 Jun 2025 12:11:32 -0700 Subject: [PATCH 5/8] Update InternalDocs/qsbr.md Co-authored-by: Kumar Aditya --- InternalDocs/qsbr.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/InternalDocs/qsbr.md b/InternalDocs/qsbr.md index c401c0f6fb3e02..222b308eb4490f 100644 --- a/InternalDocs/qsbr.md +++ b/InternalDocs/qsbr.md @@ -123,7 +123,7 @@ multi-threaded benchmarks reveal performance issues. ## References -1. https://github.com/python/cpython/issues/115103 +[^1]: https://github.com/python/cpython/issues/115103 2. https://youtu.be/ZXUIFj4nRjk?t=694 3. https://people.kernel.org/joelfernandes/gus-vs-rcu 4. http://bxr.su/FreeBSD/sys/kern/subr_smr.c#44 From 25b714093e489e31de1db2980cf8ed0a3a140834 Mon Sep 17 00:00:00 2001 From: Neil Schemenauer Date: Fri, 20 Jun 2025 12:18:36 -0700 Subject: [PATCH 6/8] Fix goal compare condition, format tweaks. --- InternalDocs/qsbr.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/InternalDocs/qsbr.md b/InternalDocs/qsbr.md index 222b308eb4490f..47d05856665af4 100644 --- a/InternalDocs/qsbr.md +++ b/InternalDocs/qsbr.md @@ -39,15 +39,15 @@ mechanism to determine when it is safe to free the list's old backing array. Specific use cases for QSBR include: -* Dictionary keys (PyDictKeysObject) and list arrays (_PyListArray): When a +* Dictionary keys (PyDictKeysObject) and list arrays (`_PyListArray`): When a dictionary or list that may be shared between threads is resized, we use QSBR to delay freeing the old keys or array until it's safe. For dicts and lists that are not shared, their storage can be freed immediately upon resize. -* Mimalloc mi_page_t: Non-locking dictionary and list accesses require +* Mimalloc `mi_page_t`: Non-locking dictionary and list accesses require cooperation from the memory allocator. If an object is freed and its memory is reused, we must ensure the new object's reference count field is at the same -memory location. In practice, this means when a mimalloc page (mi_page_t) +memory location. In practice, this means when a mimalloc page (`mi_page_t`) becomes empty, we don't immediately allow it to be reused for allocations of a different size class. QSBR is used to determine when it's safe to repurpose the page or return its memory to the OS. @@ -91,9 +91,9 @@ Periodically, a polling mechanism processes this deferred-free list: 1. The minimum read sequence value across all active threads is calculated and stored as the global `rd_seq`. -2. For each item on the deferred-free list, if its qsbr_goal is less than the - new `rd_seq`, its memory is freed, and it is removed from the list. Otherwise, - it remains on the list for a future attempt. +2. For each item on the deferred-free list, if its qsbr_goal is less than or + equal to the new `rd_seq`, its memory is freed, and it is removed from the + list. Otherwise, it remains on the list for a future attempt. ### Deferred Advance Optimization From f23c1e3be7308b8663bba912250498e645d920f5 Mon Sep 17 00:00:00 2001 From: Neil Schemenauer Date: Mon, 23 Jun 2025 14:31:25 -0700 Subject: [PATCH 7/8] Formatting adjustment based on review. --- InternalDocs/qsbr.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/InternalDocs/qsbr.md b/InternalDocs/qsbr.md index 47d05856665af4..71f6e9221351cc 100644 --- a/InternalDocs/qsbr.md +++ b/InternalDocs/qsbr.md @@ -26,7 +26,7 @@ for threads to report this state. While CPython's memory management is dominated by reference counting and a tracing garbage collector, these mechanisms are not suitable for all data structures. For example, the backing array of a list object is not individually -reference-counted but may have a shorter lifetime than the PyListObject that +reference-counted but may have a shorter lifetime than the `PyListObject` that contains it. We could delay reclamation until the next GC run, but we want reclamation to be prompt and to run the GC less frequently in the free-threaded build, as it requires pausing all threads. @@ -39,7 +39,7 @@ mechanism to determine when it is safe to free the list's old backing array. Specific use cases for QSBR include: -* Dictionary keys (PyDictKeysObject) and list arrays (`_PyListArray`): When a +* Dictionary keys (`PyDictKeysObject`) and list arrays (`_PyListArray`): When a dictionary or list that may be shared between threads is resized, we use QSBR to delay freeing the old keys or array until it's safe. For dicts and lists that are not shared, their storage can be freed immediately upon resize. @@ -58,12 +58,13 @@ page or return its memory to the OS. ### Core Implementation -The proposal to add QSBR to Python is contained in Github issue 115103 [^1]. +The proposal to add QSBR to Python is contained in [Github issue 115103] +(https://github.com/python/cpython/issues/115103). Many details of that proposal have been copied here, so they can be kept up-to-date with the actual implementation. Python's QSBR implementation is based on FreeBSD's "Global Unbounded -Sequences." [2, 3, 4]. It relies on a few key counters: +Sequences." [^1, ^2, ^3]. It relies on a few key counters: * Global Write Sequence (`wr_seq`): A per-interpreter counter, `wr_seq`, is started at 1 and incremented by 2 each time it is advanced. This ensures its value is @@ -92,7 +93,7 @@ Periodically, a polling mechanism processes this deferred-free list: stored as the global `rd_seq`. 2. For each item on the deferred-free list, if its qsbr_goal is less than or - equal to the new `rd_seq`, its memory is freed, and it is removed from the + equal to the new `rd_seq`, its memory is freed, and it is removed from the: list. Otherwise, it remains on the list for a future attempt. @@ -123,7 +124,6 @@ multi-threaded benchmarks reveal performance issues. ## References -[^1]: https://github.com/python/cpython/issues/115103 -2. https://youtu.be/ZXUIFj4nRjk?t=694 -3. https://people.kernel.org/joelfernandes/gus-vs-rcu -4. http://bxr.su/FreeBSD/sys/kern/subr_smr.c#44 +[^1]: https://youtu.be/ZXUIFj4nRjk?t=694 +[^2]: https://people.kernel.org/joelfernandes/gus-vs-rcu +[^3]: http://bxr.su/FreeBSD/sys/kern/subr_smr.c#44 From 7b08578e7973b924a85362002f9c85e48719557c Mon Sep 17 00:00:00 2001 From: Neil Schemenauer Date: Mon, 23 Jun 2025 14:41:14 -0700 Subject: [PATCH 8/8] More format changes. --- InternalDocs/qsbr.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/InternalDocs/qsbr.md b/InternalDocs/qsbr.md index 71f6e9221351cc..1c4a79a7b44436 100644 --- a/InternalDocs/qsbr.md +++ b/InternalDocs/qsbr.md @@ -58,13 +58,13 @@ page or return its memory to the OS. ### Core Implementation -The proposal to add QSBR to Python is contained in [Github issue 115103] -(https://github.com/python/cpython/issues/115103). +The proposal to add QSBR to Python is contained in +[Github issue 115103](https://github.com/python/cpython/issues/115103). Many details of that proposal have been copied here, so they can be kept up-to-date with the actual implementation. Python's QSBR implementation is based on FreeBSD's "Global Unbounded -Sequences." [^1, ^2, ^3]. It relies on a few key counters: +Sequences." [^1][^2][^3]. It relies on a few key counters: * Global Write Sequence (`wr_seq`): A per-interpreter counter, `wr_seq`, is started at 1 and incremented by 2 each time it is advanced. This ensures its value is