@@ -1087,7 +1087,8 @@ deduce_unreachable(PyGC_Head *base, PyGC_Head *unreachable) {
1087
1087
* everything else (in base) to unreachable.
1088
1088
* NOTE: This used to move the reachable objects into a reachable
1089
1089
* set instead. But most things usually turn out to be reachable,
1090
- * so it's more efficient to move the unreachable things.
1090
+ * so it's more efficient to move the unreachable things. See note
1091
+ ^ [REACHABLE OR UNREACHABLE?} at the file end.
1091
1092
*/
1092
1093
gc_list_init (unreachable );
1093
1094
move_unreachable (base , unreachable ); // gc_prev is pointer again
@@ -2183,3 +2184,39 @@ PyObject_GC_Del(void *op)
2183
2184
}
2184
2185
PyObject_FREE (g );
2185
2186
}
2187
+
2188
+ /* ------------------------------------------------------------------------
2189
+ Notes
2190
+
2191
+ [REACHABLE OR UNREACHABLE?}
2192
+
2193
+ It "sounds slick" to move the unreachable objects, until you think about
2194
+ it - the reason it pays isn't actually obvious.
2195
+
2196
+ Suppose we create objects A, B, C in that order. They appear in the young
2197
+ generation in the same order. If B points to A, and C to B, and C is
2198
+ reachable from outside, then the adjusted refcounts will be 0, 0, and 1
2199
+ respectively.
2200
+
2201
+ When move_unreachable finds A, A is moved to the unreachable list. The
2202
+ same for B when it's first encountered. Then C is traversed, B is moved
2203
+ _back_ to the reachable list. B is eventually traversed, and then A is
2204
+ moved back to the reachable list.
2205
+
2206
+ So instead of not moving at all, the reachable objects B and A are moved
2207
+ twice each. Why is this a win? A straightforward algorithm to move the
2208
+ reachable objects instead would move A, B, and C once each.
2209
+
2210
+ The key is that this dance leaves the objects in order C, B, A - it's
2211
+ reversed from the original order. On all _subsequent_ scans, none of
2212
+ them will move. Since most objects aren't in cycles, this can save an
2213
+ unbounded number of moves across an unbounded number of later collections.
2214
+ It can cost more only the first time the chain is scanned.
2215
+
2216
+ Drawback: move_unreachable is also used to find out what's still trash
2217
+ after finalizers may resurrect objects. In _that_ case most unreachable
2218
+ objects will remain unreachable, so it would be more efficient to move
2219
+ the reachable objects instead. But this is a one-time cost, probably not
2220
+ worth complicating the code to speed just a little.
2221
+ ------------------------------------------------------------------------ */
2222
+
0 commit comments