Skip to content

#127648 introduced a ~12% performance regression in python_startup_no_site benchmark #132952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mdboom opened this issue Apr 25, 2025 · 8 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-bug An unexpected behavior, bug, or error

Comments

@mdboom
Copy link
Contributor

mdboom commented Apr 25, 2025

Bug report

Bug description:

Plotting the Faster CPython team's weekly benchmarks, there is an obvious discontinuity in the python_startup_no_site benchmark:

Image

This is between these two commits:

2025-03-03 0.98 b3c18bf
2025-03-08 0.85 a3990df

Bisecting over this range, it's reproducible that the first bad commit is c6dd2348ca.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

@mdboom mdboom added the type-bug An unexpected behavior, bug, or error label Apr 25, 2025
@mdboom mdboom self-assigned this Apr 25, 2025
@mdboom mdboom added the performance Performance or resource usage label Apr 25, 2025
@JelleZijlstra
Copy link
Member

cc @srittau. I suppose this is because io is imported at startup no matter what, and importing io is now slightly slower.

@mdboom
Copy link
Contributor Author

mdboom commented Apr 25, 2025

cc @srittau. I suppose this is because io is imported at startup no matter what, and importing io is now slightly slower.

Yes, that's the gist of it. I'm surprised by the size of the regression -- I think that's a testament to how close to the lower limit Python startup already is.

@JelleZijlstra
Copy link
Member

It looks like we need io at startup (in pylifecycle.c) only in order to access io.open and io.TextIOWrapper, both of which come from the private _io module, so we could speed up startup by importing from _io instead: #132957. Not completely sure it's worth it, but it's a very simple change.

@srittau
Copy link
Contributor

srittau commented Apr 25, 2025

That's a huge difference in startup time.

It looks like we need io at startup (in pylifecycle.c) only in order to access io.open and io.TextIOWrapper, both of which come from the private _io module, so we could speed up startup by importing from _io instead: #132957. Not completely sure it's worth it, but it's a very simple change.

This sounds worthwhile to me, independent from this regression. Importing a Python-only module will most likely always incur a fairly large performance penalty.

We could also implement the pseudo-protocols in C if that helps with performance. Finally, we could go back to the original plan to move the protocols to typing, although they are better placed in io.

@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Apr 25, 2025
@srittau
Copy link
Contributor

srittau commented Apr 28, 2025

@mdboom For my understanding: This is only for the python_startup_no_site benchmark? Is there a significant slowdown for "normal" Python startup benchmark?

@mdboom
Copy link
Contributor Author

mdboom commented Apr 28, 2025

Yes, there is also a slowdown for python_startup, but not so cleanly for a single commit like this, it's more death by a thousand papercuts:

Image

@JelleZijlstra
Copy link
Member

I think this regresssion is primarily from adding the _collections_abc import, and that module gets imported anyway when site.py is used (because os.py imports it). _collections_abc is relatively slow. The startup with site would also get slightly slower from adding two more classes defined at startup, but that's fast enough that it's probably not easily detectable.

@JelleZijlstra
Copy link
Member

If we want to speed this up further, I think this is the most promising parts are these:

import time:       601 |        601 |   time
import time:       209 |        810 | zipimport
import time:        49 |         49 |     _codecs
import time:       535 |        583 |   codecs
import time:       400 |        400 |   encodings.aliases
import time:       608 |       1590 | encodings
import time:       172 |        172 | encodings.utf_8

zipimport seems like it should only be needed if we actually use ZIP imports, which many programs won't. Maybe we can defer importing it until we need it?

For encodings, many programs will only ever need UTF-8, not the whole registry system. It looks like this gets triggered by _PyUnicode_InitEncodings, which first initiates the codec registry (triggering import encodings) and then sets up the default encoding (usually UTF-8). Perhaps there can be a fast path where we only register UTF-8 and defer registering the others until we need them.

This is not really part of this issue since it's not a regression, but leaving this here in case there's interest in speeding up startup further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants