Skip to content

gh-118761: Optimise import time for ast #131953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 2, 2025
Merged

Conversation

AA-Turner
Copy link
Member

@AA-Turner AA-Turner commented Mar 31, 2025

Attempting @JelleZijlstra's suggestion in #118761 (comment).

Running python -Ximporttime -Sc "import ast" ten times with the current and proposed modules, I get 14.1ms for the current module and 1.48ms for the new, for a ~12ms speed up. I've left numbers out of the NEWS entry for now, as I've only tested on a Windows PC -- Linux may have different performance profiles.

Full Times

(using the last line of each output)

Current

import time: self [us] | cumulative | imported package
import time:      1277 |      13904 | ast
import time:      1363 |      14813 | ast
import time:      1343 |      14246 | ast
import time:      1330 |      14280 | ast
import time:      1387 |      13974 | ast
import time:      1373 |      13837 | ast
import time:      1345 |      14456 | ast
import time:      1349 |      14050 | ast
import time:      1322 |      14226 | ast
import time:      1246 |      13444 | ast

Proposed

import time:       455 |       1338 | ast
import time:       457 |       1387 | ast
import time:       476 |       1470 | ast
import time:       459 |       1474 | ast
import time:       493 |       1480 | ast
import time:       499 |       1458 | ast
import time:       448 |       1513 | ast
import time:       492 |       1501 | ast
import time:       487 |       1515 | ast
import time:       602 |       1688 | ast

@eli-schwartz
Copy link
Contributor

$ wc -l /usr/lib/python3.{8,9,1*}/ast.py 
   549 /usr/lib/python3.8/ast.py
  1602 /usr/lib/python3.9/ast.py
  1709 /usr/lib/python3.10/ast.py
  1752 /usr/lib/python3.11/ast.py
  1840 /usr/lib/python3.12/ast.py
  1871 /usr/lib/python3.13/ast.py
  1777 /usr/lib/python3.14/ast.py

And with this PR we are back down to 671 lines. The massive jump is from when unparse was added.

This feels nice all on its own, to avoid so much code that is often not needed.

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

A few thoughts:

  • An alternative approach could be to make ast a package, with the unparse code in ast/_unparse.py and the rest of the functionality in ast/__init__.py. Pro: Doesn't need a new top-level module; avoids the (probably theoretical) compatibility concern that someone could have their own top-level _ast_unparse module that interferes with the stdlib one. Con: Makes the directory structure more complex.
  • I'm not sure we need to keep getattr support for the private attributes, like the Precedence enum and the various constants. I'd do that only if there's evidence they are being used in third-party code, especially widely used packages. Otherwise we just make our lives harder if we want to refactor the unparse code in the future.

@AA-Turner
Copy link
Member Author

  • I'm not sure we need to keep getattr support for the private attributes, like the Precedence enum and the various constants. I'd do that only if there's evidence they are being used in third-party code, especially widely used packages. Otherwise we just make our lives harder if we want to refactor the unparse code in the future.

Reverted.

  • An alternative approach could be to make ast a package, with the unparse code in ast/_unparse.py and the rest of the functionality in ast/__init__.py. Pro: Doesn't need a new top-level module; avoids the (probably theoretical) compatibility concern that someone could have their own top-level _ast_unparse module that interferes with the stdlib one. Con: Makes the directory structure more complex.

I'd suggest doing that in a follow-up PR, if desirable. I think for now this is the simplest approach. If it provides any reassurance, GitHub has no results for the _ast_unparse.py file, so I think approaching 0% chance of breaking things in that regard.

@AA-Turner AA-Turner merged commit f20f02e into python:main Apr 2, 2025
46 checks passed
@AA-Turner AA-Turner deleted the opt-ast branch April 2, 2025 16:22
@vstinner
Copy link
Member

vstinner commented Apr 2, 2025

Oh, I was reviewing the change while it has been merged :-) Well, it LGTM. I had minor suggestions, but they don't matter anymore.

Nice optimization.

@AA-Turner AA-Turner added the performance Performance or resource usage label Apr 2, 2025
@AA-Turner
Copy link
Member Author

Oh, I was reviewing the change while it has been merged :-) Well, it LGTM. I had minor suggestions, but they don't matter anymore.

Sorry! Happy to open a follow-up PR for any of your suggestions?

@vstinner
Copy link
Member

vstinner commented Apr 2, 2025

Benchmark: 13.1 ms => 18.8 ms (-5.7 ms).

Before:

$ hyperfine --warmup 1 "./python -c 'import ast'"
Benchmark 1: ./python -c 'import ast'
  Time (mean ± σ):      18.8 ms ±   0.6 ms    [User: 16.8 ms, System: 2.0 ms]
  Range (min … max):    17.9 ms …  20.3 ms    138 runs

After:

$ hyperfine --warmup 1 "./python -c 'import ast'"
Benchmark 1: ./python -c 'import ast'
  Time (mean ± σ):      13.1 ms ±   0.5 ms    [User: 11.1 ms, System: 2.0 ms]
  Range (min … max):    12.5 ms …  14.4 ms    206 runs

@vstinner
Copy link
Member

vstinner commented Apr 2, 2025

Sorry! Happy to open a follow-up PR for any of your suggestions?

Well, I don't know if it would better, but an alternative to adding __dir__() and __getattr__() functions is to add such wrapper function:

def unparse(ast_obj):
    import _ast_unparse
    return _ast_unparse.unparse(ast_obj)

@danielhollas
Copy link
Contributor

Well, I don't know if it would better, but an alternative to adding dir() and getattr() functions is to add such wrapper function:

The __dir__ implementation in particular seems a bit fragile. What happens if somebody introduces another stdlib dependency (besides sys) and forgets to exclude it in __dir__ ?

@danielhollas
Copy link
Contributor

Hmm, unless I am missing something, it seems that with this change unparse is not being included in star imports?

python
Python 3.12.7 (main, Oct  1 2024, 00:00:00) [GCC 13.3.1 20240913 (Red Hat 13.3.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ast import *
>>> unparse
<function unparse at 0x7f7f8ebc9c60>

❯ ./python
Python 3.14.0a6+ (heads/main:f20f02e6b58, Apr  2 2025, 17:41:39) [GCC 13.3.1 20240913 (Red Hat 13.3.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ast import *
>>> unparse
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    unparse
NameError: name 'unparse' is not defined. Did you mean: 'parse'?

@AA-Turner
Copy link
Member Author

I think (untested) that the __getattr__ approach is faster for the second access of unparse on, as it adds unparse to the module globals, whereas the proposed wrapper executes the (cached) import _ast_unparse each time. I'm happy to update __dir__, though.

@AA-Turner
Copy link
Member Author

I've opened #132024 for follow-ups.

@JelleZijlstra
Copy link
Member

Alternative followup in #132025.

seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants