Skip to content

Understand the outlier benchmarks on 3.14 (main) vs. 3.13.0 #726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5 of 25 tasks
mdboom opened this issue Apr 24, 2025 · 11 comments
Open
5 of 25 tasks

Understand the outlier benchmarks on 3.14 (main) vs. 3.13.0 #726

mdboom opened this issue Apr 24, 2025 · 11 comments

Comments

@mdboom
Copy link
Contributor

mdboom commented Apr 24, 2025

As suggested in the last sync meeting, we should understand why some of the benchmarks regressed and progressed. There are possible outcomes for each:

  1. The benchmark is poorly designed
  2. There is low-hanging fixes in CPython to reduce the regression
  3. We are reasonably comfortable with the regression given improvements elsewhere

I think as a first pass, we should just try to classify along these lines, and then fix CPython (where possible) first, and fix benchmarks with a lower priority.

For the progressions, it may just be a source of WHATSNEW content.

Let's crowdsource this where possible, reporting back to the checklist below.

Using the last weekly as a guide, the statistically significant regressions are below. For longitudinal details, see the plot of benchmark performance over time below.

  • subparsers, many_optionals (argparse)
  • python_startup / python_startup_no_site
  • json_dumps / json_loads
  • mako
  • nbody
  • coroutines
  • typing_runtime_protocols
  • fannkuch
  • deltablue
  • shortest_path (networkx)
  • pickle_pure_python

The most statistically significant progressions are:

  • mdp (tuple hash caching provided a major speedup)
  • deepcopy / deepcopy_memo
  • go
  • regex / regex_effbot / regex_v8
  • float
  • pylint
  • spectral_norm
  • richards / richards_super
  • xml_etree_parse
  • dulwich_log
  • tomli_loads
  • genshi_text
  • 2to3
  • async stuff
@mdboom
Copy link
Contributor Author

mdboom commented Apr 24, 2025

mdp: With the introduction of tuple hash caching, this benchmark sped up 2x. It uses trees of namedtuples as dictionary keys for a graph-like data structure.

@eendebakpt
Copy link

deepcopy: Improved with python/cpython#114266 and python/cpython#128119

@mdboom
Copy link
Contributor Author

mdboom commented Apr 24, 2025

By plotting these benchmarks over time, we can see whether the effect is "real" or in the noise. I have noted these above. Some may be "real" but we will need to retro data earlier in the 3.14 cycle to find any source of the regression.

Plot of benchmark regression over time

Image

@eendebakpt
Copy link

The dulwich_log improvement is at the time python/cpython#118144 was merged (looking at the Python Speed Center data, the plots above miss a timelime). Seems plausible, as the benchmark looks at data on disk.

@methane
Copy link

methane commented Apr 30, 2025

json_dumps regression may be caused by using public PyUnicodeWriter APIs. (PR)

python/cpython#133186 will fix json_dump.

json_loads regression may be caused by same. (PR)
But I can not reproduce the regression between 3.13 and 3.14 on my machine.

@eendebakpt
Copy link

Typing runtime protocols could be python/cpython#118202 ?

@mdboom
Copy link
Contributor Author

mdboom commented Apr 30, 2025

json_dumps regression may be caused by using public PyUnicodeWriter APIs. (PR)

python/cpython#133186 will fix json_dump.

json_loads regression may be caused by same. (PR) But I can not reproduce the regression between 3.13 and 3.14 on my machine.

My bisecting yesterday just found this same commit as the cause, so I think you are right. Thanks for proposing a solution.

@JelleZijlstra
Copy link
Contributor

Typing runtime protocols could be python/cpython#118202 ?

That change went into 3.13 before release so I don't think it can explain a difference between 3.13.0 and current main.

@AlexWaygood
Copy link

typing._ProtocolMeta.__instancecheck__() spends a lot of time in inspect.getattr_static(). getattr_static() is a somewhat complicated function which does a number of unusual things. Most of the unusual things it does are there for correctness reasons, but some of them are performance micro-optimisations that might not be effective anymore following changes to the interpreter between Python 3.13 and 3.14.

I don't have time to help investigate right now what changes might have contributed to the slowdown in that benchmark, but I'd be happy to spend some time on it at PyCon.

@kumaraditya303
Copy link

asyncio improvement is from python/cpython#107803

@mdboom
Copy link
Contributor Author

mdboom commented Apr 30, 2025

I don't have time to help investigate right now what changes might have contributed to the slowdown in that benchmark, but I'd be happy to spend some time on it at PyCon.

Great. It's also completely reasonable to say "this benchmark is a microbenchmark that isn't very indicative of real-world code" and not resolve it. I don't want to make all of those calls personally -- they kind of require domain expertise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants