gh-129210: GC optimisations for case where no objects being collected have finalizers #132488

duaneg · 2025-04-14T01:03:49Z

If no objects being collected have finalizers, they will not be able to be resurrected. This lets us avoid some repeated work we would otherwise have to do.

Issue: Performance regression with cyclic GC vs 3.13 #129210

To handle objects potentially resurrected in finalizers it is necessary to repeat the reachability check on objects to be collected. If there are no finalizers run this can be skipped: in that case it is not possible for an object to be resurrected. In my environment and testing this gives a ~25-30% performance improvement for the pythongh-129210 micro-benchmark.

bedevere-app · 2025-04-14T01:03:55Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

duaneg · 2025-04-14T01:07:17Z

Given this is just an optimisation for the incremental GC, which is being introduced in v3.14, I don't imagine it requires a blurb, but I'm happy to add one if it is felt useful.

I've also had a quick look at the free-threading GC and it appears at first glance like similar optimisations are possible for it also. I'm investigating that now and if it pans out will add commits for it to this PR.

Avoid an unnecessary list initialisation and merge: this is a very marginal optimisation which gains a couple of percent speed increase for a few benchmarks. The marginal performance improvement may not be worth the marginal additional code complexity.

bedevere-app · 2025-04-14T03:04:02Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

ZeroIntensity

Are there any benchmarks? (Also, please don't force push and make the bot spam; we squash merge at the end, and you need a blurb entry anyway.)

duaneg · 2025-04-14T13:37:52Z

Are there any benchmarks?

The percentage improvements I mentioned in the issue discussion and commit logs were taken from the benchmark's output and perf stat. They were for the repro example, with GC enabled and a few other tweaks, running on an optimised build, and have been relatively consistent in my testing. They are not rigorous at all, though.

I've been running pyperformance benchmarks, but they take a long time for the full suite and I'm not happy with the reliability. A lot of them are being reported as "unstable". I've just finished benchmarking this latest version compared to main and, as far as "significant" differences go, it is mostly a wash: 13 faster, 12 slower, mostly marginal differences. However, there was a massive 31% regression in unpack_sequence, that changed to a 3% improvement when I re-ran that particular benchmark only with --min-time 2. This does not give me confidence in the results :-(

Any tips on how to speed this up and/or make it more reliable would be welcome.

(Also, please don't force push and make the bot spam; we squash merge at the end, and you need a blurb entry anyway.)

Sorry! A habit from other organisations, with other practices, that I'll try and resist in the future.

duaneg · 2025-04-15T23:57:22Z

I've just finished running pyperformance --rigorous --min-time 2 with its default suite, minus a few benchmarks that are failing for unrelated reasons.

Out of 102 tests, 20 are "significantly" faster and 7 slower

### 2to3 ###
Mean +- std dev: 197 ms +- 1 ms -> 192 ms +- 1 ms: 1.03x faster
Significant (t=58.77)
--
### async_tree_eager_io_tg ###
Mean +- std dev: 508 ms +- 6 ms -> 497 ms +- 6 ms: 1.02x faster
Significant (t=13.52)
--
### async_tree_eager_tg ###
Mean +- std dev: 175 ms +- 2 ms -> 171 ms +- 1 ms: 1.03x faster
Significant (t=22.28)
--
### async_tree_io_tg ###
Mean +- std dev: 511 ms +- 7 ms -> 501 ms +- 9 ms: 1.02x faster
Significant (t=10.46)
--
### asyncio_tcp ###
Mean +- std dev: 272 ms +- 5 ms -> 266 ms +- 6 ms: 1.02x faster
Significant (t=8.03)
--
### coroutines ###
Mean +- std dev: 16.5 ms +- 0.4 ms -> 17.3 ms +- 0.4 ms: 1.05x slower
Significant (t=-15.60)
--
### coverage ###
Mean +- std dev: 80.1 ms +- 1.3 ms -> 73.6 ms +- 1.3 ms: 1.09x faster
Significant (t=38.51)
--
### create_gc_cycles ###
Mean +- std dev: 1.43 ms +- 0.02 ms -> 1.39 ms +- 0.02 ms: 1.03x faster
Significant (t=22.63)
--
### deltablue ###
Mean +- std dev: 2.33 ms +- 0.05 ms -> 2.28 ms +- 0.05 ms: 1.02x faster
Significant (t=7.98)
--
### docutils ###
Mean +- std dev: 1.96 sec +- 0.01 sec -> 1.91 sec +- 0.01 sec: 1.02x faster
Significant (t=25.21)
--
### html5lib ###
Mean +- std dev: 56.4 ms +- 0.9 ms -> 54.5 ms +- 0.8 ms: 1.04x faster
Significant (t=17.40)
--
### logging_silent ###
Mean +- std dev: 73.1 ns +- 1.2 ns -> 75.9 ns +- 9.0 ns: 1.04x slower
Significant (t=-3.38)
--
### meteor_contest ###
Mean +- std dev: 73.6 ms +- 0.7 ms -> 72.2 ms +- 0.8 ms: 1.02x faster
Significant (t=15.82)
--
### nqueens ###
Mean +- std dev: 61.7 ms +- 0.8 ms -> 63.3 ms +- 0.7 ms: 1.03x slower
Significant (t=-15.58)
--
### pickle_pure_python ###
Mean +- std dev: 248 us +- 5 us -> 242 us +- 4 us: 1.02x faster
Significant (t=10.16)
--
### raytrace ###
Mean +- std dev: 213 ms +- 3 ms -> 207 ms +- 10 ms: 1.03x faster
Significant (t=5.70)
--
### regex_effbot ###
Mean +- std dev: 2.11 ms +- 0.06 ms -> 2.15 ms +- 0.15 ms: 1.02x slower
Significant (t=-3.10)
--
### richards_super ###
Mean +- std dev: 36.9 ms +- 3.1 ms -> 36.1 ms +- 0.6 ms: 1.02x faster
Significant (t=2.83)
--
### scimark_monte_carlo ###
Mean +- std dev: 49.5 ms +- 0.8 ms -> 47.7 ms +- 0.8 ms: 1.04x faster
Significant (t=17.27)
--
### spectral_norm ###
Mean +- std dev: 77.3 ms +- 2.2 ms -> 72.2 ms +- 0.9 ms: 1.07x faster
Significant (t=23.55)
--
### sqlglot_v2_parse ###
Mean +- std dev: 971 us +- 11 us -> 936 us +- 14 us: 1.04x faster
Significant (t=20.56)
--
### sqlglot_v2_transpile ###
Mean +- std dev: 1.20 ms +- 0.01 ms -> 1.16 ms +- 0.02 ms: 1.03x faster
Significant (t=22.28)
--
### subparsers ###
Mean +- std dev: 15.2 ms +- 0.2 ms -> 14.8 ms +- 0.1 ms: 1.03x faster
Significant (t=23.29)
--
### unpack_sequence ###
Mean +- std dev: 25.0 ns +- 0.6 ns -> 26.8 ns +- 0.3 ns: 1.07x slower
Significant (t=-31.46)
--
### unpickle_list ###
Mean +- std dev: 3.52 us +- 0.05 us -> 3.44 us +- 0.06 us: 1.02x faster
Significant (t=10.27)
--
### xml_etree_iterparse ###
Mean +- std dev: 63.8 ms +- 0.7 ms -> 65.1 ms +- 0.4 ms: 1.02x slower
Significant (t=-17.91)
--
### xml_etree_parse ###
Mean +- std dev: 92.7 ms +- 2.1 ms -> 94.9 ms +- 0.7 ms: 1.02x slower
Significant (t=-10.59)

However, I still don't have much confidence in the reliability of these results. Many are unstable. unpack_sequence, regressed 7% here, improving 3% when I benchmarked it earlier, and regressing 31% before that (all comparing the same two versions). When I profile it, GC doesn't appear in the perf report.

…zers run As for the regular GIL garbage collector, skip the second reachability check if no objects to-be-collected have finalizers, and hence none can be resurrected. In (so far very limited) testing this gives a ~10% overall performance improvement to the motivating benchmark's runtime and a ~35% improvement to its reported time. The pyperformance gc_traversal and gc_collect benchmarks show no significant difference and a 6% improvement respectively.

JelleZijlstra

Makes sense but I'm not familiar enough with the GC to approve.

Misc/NEWS.d/next/Core_and_Builtins/2025-04-16-12-02-26.gh-issue-129210.a5uLpE.rst

Python/gc.c

Python/gc_free_threading.c

…e-129210.a5uLpE.rst Co-authored-by: Jelle Zijlstra <[email protected]>

Co-authored-by: Jelle Zijlstra <[email protected]>

bedevere-app bot added the awaiting review label Apr 14, 2025

duaneg changed the title ~~hc-129210: GC optimisations for case where no objects being collected have finalizers~~ gh-12920: GC optimisations for case where no objects being collected have finalizers Apr 14, 2025

duaneg force-pushed the gh-129210 branch from 24c3da6 to 0464fb7 Compare April 14, 2025 03:03

duaneg changed the title ~~gh-12920: GC optimisations for case where no objects being collected have finalizers~~ gh-129210: GC optimisations for case where no objects being collected have finalizers Apr 14, 2025

bedevere-app bot mentioned this pull request Apr 14, 2025

Performance regression with cyclic GC vs 3.13 #129210

Open

ZeroIntensity added the performance Performance or resource usage label Apr 14, 2025

ZeroIntensity reviewed Apr 14, 2025

View reviewed changes

duaneg added 2 commits April 16, 2025 12:02

Add blurb

6ae3f84

JelleZijlstra reviewed Apr 28, 2025

View reviewed changes

Misc/NEWS.d/next/Core_and_Builtins/2025-04-16-12-02-26.gh-issue-129210.a5uLpE.rst Outdated Show resolved Hide resolved

Python/gc.c Show resolved Hide resolved

Python/gc_free_threading.c Show resolved Hide resolved

duaneg and others added 3 commits April 28, 2025 15:49

Update Misc/NEWS.d/next/Core_and_Builtins/2025-04-16-12-02-26.gh-issu…

cd45269

…e-129210.a5uLpE.rst Co-authored-by: Jelle Zijlstra <[email protected]>

Update Python/gc.c

27fb67c

Co-authored-by: Jelle Zijlstra <[email protected]>

Update Python/gc_free_threading.c

85f41b3

Co-authored-by: Jelle Zijlstra <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-129210: GC optimisations for case where no objects being collected have finalizers #132488

gh-129210: GC optimisations for case where no objects being collected have finalizers #132488

Uh oh!

duaneg commented Apr 14, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

bedevere-app bot commented Apr 14, 2025

Uh oh!

duaneg commented Apr 14, 2025

Uh oh!

bedevere-app bot commented Apr 14, 2025

Uh oh!

ZeroIntensity left a comment •

edited

Loading

Uh oh!

duaneg commented Apr 14, 2025

Uh oh!

duaneg commented Apr 15, 2025

Uh oh!

JelleZijlstra left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gh-129210: GC optimisations for case where no objects being collected have finalizers #132488

Are you sure you want to change the base?

gh-129210: GC optimisations for case where no objects being collected have finalizers #132488

Uh oh!

Conversation

duaneg commented Apr 14, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-app bot commented Apr 14, 2025

Uh oh!

duaneg commented Apr 14, 2025

Uh oh!

bedevere-app bot commented Apr 14, 2025

Uh oh!

ZeroIntensity left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

duaneg commented Apr 14, 2025

Uh oh!

duaneg commented Apr 15, 2025

Uh oh!

JelleZijlstra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

duaneg commented Apr 14, 2025 •

edited by bedevere-app bot

Loading

ZeroIntensity left a comment •

edited

Loading