Skip to content

Commit d9d3993

Browse files
authored
Years overdue, explain why unreachable objects are moved. (GH-17030)
1 parent 8d4fef4 commit d9d3993

File tree

1 file changed

+38
-1
lines changed

1 file changed

+38
-1
lines changed

Modules/gcmodule.c

+38-1
Original file line numberDiff line numberDiff line change
@@ -1087,7 +1087,8 @@ deduce_unreachable(PyGC_Head *base, PyGC_Head *unreachable) {
10871087
* everything else (in base) to unreachable.
10881088
* NOTE: This used to move the reachable objects into a reachable
10891089
* set instead. But most things usually turn out to be reachable,
1090-
* so it's more efficient to move the unreachable things.
1090+
* so it's more efficient to move the unreachable things. See note
1091+
^ [REACHABLE OR UNREACHABLE?} at the file end.
10911092
*/
10921093
gc_list_init(unreachable);
10931094
move_unreachable(base, unreachable); // gc_prev is pointer again
@@ -2183,3 +2184,39 @@ PyObject_GC_Del(void *op)
21832184
}
21842185
PyObject_FREE(g);
21852186
}
2187+
2188+
/* ------------------------------------------------------------------------
2189+
Notes
2190+
2191+
[REACHABLE OR UNREACHABLE?}
2192+
2193+
It "sounds slick" to move the unreachable objects, until you think about
2194+
it - the reason it pays isn't actually obvious.
2195+
2196+
Suppose we create objects A, B, C in that order. They appear in the young
2197+
generation in the same order. If B points to A, and C to B, and C is
2198+
reachable from outside, then the adjusted refcounts will be 0, 0, and 1
2199+
respectively.
2200+
2201+
When move_unreachable finds A, A is moved to the unreachable list. The
2202+
same for B when it's first encountered. Then C is traversed, B is moved
2203+
_back_ to the reachable list. B is eventually traversed, and then A is
2204+
moved back to the reachable list.
2205+
2206+
So instead of not moving at all, the reachable objects B and A are moved
2207+
twice each. Why is this a win? A straightforward algorithm to move the
2208+
reachable objects instead would move A, B, and C once each.
2209+
2210+
The key is that this dance leaves the objects in order C, B, A - it's
2211+
reversed from the original order. On all _subsequent_ scans, none of
2212+
them will move. Since most objects aren't in cycles, this can save an
2213+
unbounded number of moves across an unbounded number of later collections.
2214+
It can cost more only the first time the chain is scanned.
2215+
2216+
Drawback: move_unreachable is also used to find out what's still trash
2217+
after finalizers may resurrect objects. In _that_ case most unreachable
2218+
objects will remain unreachable, so it would be more efficient to move
2219+
the reachable objects instead. But this is a one-time cost, probably not
2220+
worth complicating the code to speed just a little.
2221+
------------------------------------------------------------------------ */
2222+

0 commit comments

Comments
 (0)