Skip to content

gh-129210: GC optimisations for case where no objects being collected have finalizers #132488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

duaneg
Copy link

@duaneg duaneg commented Apr 14, 2025

If no objects being collected have finalizers, they will not be able to be resurrected. This lets us avoid some repeated work we would otherwise have to do.

To handle objects potentially resurrected in finalizers it is necessary to
repeat the reachability check on objects to be collected. If there are no
finalizers run this can be skipped: in that case it is not possible for an
object to be resurrected.

In my environment and testing this gives a ~25-30% performance improvement for
the pythongh-129210 micro-benchmark.
@bedevere-app
Copy link

bedevere-app bot commented Apr 14, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@duaneg
Copy link
Author

duaneg commented Apr 14, 2025

Given this is just an optimisation for the incremental GC, which is being introduced in v3.14, I don't imagine it requires a blurb, but I'm happy to add one if it is felt useful.

I've also had a quick look at the free-threading GC and it appears at first glance like similar optimisations are possible for it also. I'm investigating that now and if it pans out will add commits for it to this PR.

@duaneg duaneg changed the title hc-129210: GC optimisations for case where no objects being collected have finalizers gh-12920: GC optimisations for case where no objects being collected have finalizers Apr 14, 2025
Avoid an unnecessary list initialisation and merge: this is a very marginal
optimisation which gains a couple of percent speed increase for a few
benchmarks.

The marginal performance improvement may not be worth the marginal additional
code complexity.
@bedevere-app
Copy link

bedevere-app bot commented Apr 14, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@duaneg duaneg changed the title gh-12920: GC optimisations for case where no objects being collected have finalizers gh-129210: GC optimisations for case where no objects being collected have finalizers Apr 14, 2025
@ZeroIntensity ZeroIntensity added the performance Performance or resource usage label Apr 14, 2025
Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any benchmarks? (Also, please don't force push and make the bot spam; we squash merge at the end, and you need a blurb entry anyway.)

@duaneg
Copy link
Author

duaneg commented Apr 14, 2025

Are there any benchmarks?

The percentage improvements I mentioned in the issue discussion and commit logs were taken from the benchmark's output and perf stat. They were for the repro example, with GC enabled and a few other tweaks, running on an optimised build, and have been relatively consistent in my testing. They are not rigorous at all, though.

I've been running pyperformance benchmarks, but they take a long time for the full suite and I'm not happy with the reliability. A lot of them are being reported as "unstable". I've just finished benchmarking this latest version compared to main and, as far as "significant" differences go, it is mostly a wash: 13 faster, 12 slower, mostly marginal differences. However, there was a massive 31% regression in unpack_sequence, that changed to a 3% improvement when I re-ran that particular benchmark only with --min-time 2. This does not give me confidence in the results :-(

Any tips on how to speed this up and/or make it more reliable would be welcome.

(Also, please don't force push and make the bot spam; we squash merge at the end, and you need a blurb entry anyway.)

Sorry! A habit from other organisations, with other practices, that I'll try and resist in the future.

@duaneg
Copy link
Author

duaneg commented Apr 15, 2025

I've just finished running pyperformance --rigorous --min-time 2 with its default suite, minus a few benchmarks that are failing for unrelated reasons.

Out of 102 tests, 20 are "significantly" faster and 7 slower
### 2to3 ###
Mean +- std dev: 197 ms +- 1 ms -> 192 ms +- 1 ms: 1.03x faster
Significant (t=58.77)
--
### async_tree_eager_io_tg ###
Mean +- std dev: 508 ms +- 6 ms -> 497 ms +- 6 ms: 1.02x faster
Significant (t=13.52)
--
### async_tree_eager_tg ###
Mean +- std dev: 175 ms +- 2 ms -> 171 ms +- 1 ms: 1.03x faster
Significant (t=22.28)
--
### async_tree_io_tg ###
Mean +- std dev: 511 ms +- 7 ms -> 501 ms +- 9 ms: 1.02x faster
Significant (t=10.46)
--
### asyncio_tcp ###
Mean +- std dev: 272 ms +- 5 ms -> 266 ms +- 6 ms: 1.02x faster
Significant (t=8.03)
--
### coroutines ###
Mean +- std dev: 16.5 ms +- 0.4 ms -> 17.3 ms +- 0.4 ms: 1.05x slower
Significant (t=-15.60)
--
### coverage ###
Mean +- std dev: 80.1 ms +- 1.3 ms -> 73.6 ms +- 1.3 ms: 1.09x faster
Significant (t=38.51)
--
### create_gc_cycles ###
Mean +- std dev: 1.43 ms +- 0.02 ms -> 1.39 ms +- 0.02 ms: 1.03x faster
Significant (t=22.63)
--
### deltablue ###
Mean +- std dev: 2.33 ms +- 0.05 ms -> 2.28 ms +- 0.05 ms: 1.02x faster
Significant (t=7.98)
--
### docutils ###
Mean +- std dev: 1.96 sec +- 0.01 sec -> 1.91 sec +- 0.01 sec: 1.02x faster
Significant (t=25.21)
--
### html5lib ###
Mean +- std dev: 56.4 ms +- 0.9 ms -> 54.5 ms +- 0.8 ms: 1.04x faster
Significant (t=17.40)
--
### logging_silent ###
Mean +- std dev: 73.1 ns +- 1.2 ns -> 75.9 ns +- 9.0 ns: 1.04x slower
Significant (t=-3.38)
--
### meteor_contest ###
Mean +- std dev: 73.6 ms +- 0.7 ms -> 72.2 ms +- 0.8 ms: 1.02x faster
Significant (t=15.82)
--
### nqueens ###
Mean +- std dev: 61.7 ms +- 0.8 ms -> 63.3 ms +- 0.7 ms: 1.03x slower
Significant (t=-15.58)
--
### pickle_pure_python ###
Mean +- std dev: 248 us +- 5 us -> 242 us +- 4 us: 1.02x faster
Significant (t=10.16)
--
### raytrace ###
Mean +- std dev: 213 ms +- 3 ms -> 207 ms +- 10 ms: 1.03x faster
Significant (t=5.70)
--
### regex_effbot ###
Mean +- std dev: 2.11 ms +- 0.06 ms -> 2.15 ms +- 0.15 ms: 1.02x slower
Significant (t=-3.10)
--
### richards_super ###
Mean +- std dev: 36.9 ms +- 3.1 ms -> 36.1 ms +- 0.6 ms: 1.02x faster
Significant (t=2.83)
--
### scimark_monte_carlo ###
Mean +- std dev: 49.5 ms +- 0.8 ms -> 47.7 ms +- 0.8 ms: 1.04x faster
Significant (t=17.27)
--
### spectral_norm ###
Mean +- std dev: 77.3 ms +- 2.2 ms -> 72.2 ms +- 0.9 ms: 1.07x faster
Significant (t=23.55)
--
### sqlglot_v2_parse ###
Mean +- std dev: 971 us +- 11 us -> 936 us +- 14 us: 1.04x faster
Significant (t=20.56)
--
### sqlglot_v2_transpile ###
Mean +- std dev: 1.20 ms +- 0.01 ms -> 1.16 ms +- 0.02 ms: 1.03x faster
Significant (t=22.28)
--
### subparsers ###
Mean +- std dev: 15.2 ms +- 0.2 ms -> 14.8 ms +- 0.1 ms: 1.03x faster
Significant (t=23.29)
--
### unpack_sequence ###
Mean +- std dev: 25.0 ns +- 0.6 ns -> 26.8 ns +- 0.3 ns: 1.07x slower
Significant (t=-31.46)
--
### unpickle_list ###
Mean +- std dev: 3.52 us +- 0.05 us -> 3.44 us +- 0.06 us: 1.02x faster
Significant (t=10.27)
--
### xml_etree_iterparse ###
Mean +- std dev: 63.8 ms +- 0.7 ms -> 65.1 ms +- 0.4 ms: 1.02x slower
Significant (t=-17.91)
--
### xml_etree_parse ###
Mean +- std dev: 92.7 ms +- 2.1 ms -> 94.9 ms +- 0.7 ms: 1.02x slower
Significant (t=-10.59)

However, I still don't have much confidence in the reliability of these results. Many are unstable. unpack_sequence, regressed 7% here, improving 3% when I benchmarked it earlier, and regressing 31% before that (all comparing the same two versions). When I profile it, GC doesn't appear in the perf report.

duaneg added 2 commits April 16, 2025 12:02
…zers run

As for the regular GIL garbage collector, skip the second reachability check if
no objects to-be-collected have finalizers, and hence none can be resurrected.

In (so far very limited) testing this gives a ~10% overall performance
improvement to the motivating benchmark's runtime and a ~35% improvement to its
reported time.

The pyperformance gc_traversal and gc_collect benchmarks show no significant
difference and a 6% improvement respectively.
Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense but I'm not familiar enough with the GC to approve.

duaneg and others added 3 commits April 28, 2025 15:49
Co-authored-by: Jelle Zijlstra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants