Skip to content

GH-115802: Don't JIT zero-length jumps #116177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 4, 2024

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Mar 1, 2024

Many of our JIT templates end with a jump to the next opcode. This is generally needed to ensure correct control flow... but when the jump is the very last instruction, it's entirely unnecessary. Instead, we can just continue execution into the next instruction.

This removes these jumps for the following platforms (performance results):

  • x86_64-pc-windows-msvc (2% faster)
  • i686-pc-windows-msvc (3% slower)
  • x86_64-apple-darwin (no benchmarks)
  • aarch64-unknown-linux-gnu (no benchmarks)
  • x86_64-unknown-linux-gnu (2% faster)

(aarch64-apple-darwin needs a bit more work, since the final jump is still made up of 3 instructions. That can come next.)

I'm assuming that the 32-bit Windows result is a fluke. If it's indeed the case that leaving the zero-length jumps in is faster, then we could easily remove this later by commenting out that case in remove_jump. If so, maybe it's an alignment thing and we should just pad with nops instead.

Also, this will probably be even more effective once we have hot/cold splitting of some kind. Most of the templates end with error handling or deoptimization code, and still have to jump over it to get to the next instruction.

@brandtbucher brandtbucher added performance Performance or resource usage skip news interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Mar 1, 2024
@brandtbucher brandtbucher requested a review from markshannon March 1, 2024 03:26
@brandtbucher brandtbucher self-assigned this Mar 1, 2024
@bedevere-app bedevere-app bot mentioned this pull request Mar 1, 2024
13 tasks
@markshannon
Copy link
Member

Do you have Windows for benchmarks for x86-64, or Arm?

@brandtbucher
Copy link
Member Author

No ARM benchmarks (we only have a Mac runner, and that’s not supported yet). Windows x86_64 is linked above.

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
The 32 bit Windows results are a bit strange, but I don't think it's worth worrying about for now.

I'm still seeing the final jump in the generated assembly.
How much effort would it be to clean that up?

@brandtbucher
Copy link
Member Author

I'm still seeing the final jump in the generated assembly.
How much effort would it be to clean that up?

You mean the human-readable disassembly? It would be fragile, but maybe we could just blindly remove the last two lines.

But honestly, I've avoided messing with that too much. It's a really finicky format, and we already perform some transformations that aren't fixed up there. So it really should be read as "what the compiler gave us", not "what we're jitting".

@brandtbucher brandtbucher merged commit 981f27d into python:main Mar 4, 2024
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this pull request Mar 4, 2024
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 25, 2024
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage skip news topic-JIT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants