-
-
Notifications
You must be signed in to change notification settings - Fork 32k
GH-115802: Don't JIT zero-length jumps #116177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Do you have Windows for benchmarks for x86-64, or Arm? |
No ARM benchmarks (we only have a Mac runner, and that’s not supported yet). Windows x86_64 is linked above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
The 32 bit Windows results are a bit strange, but I don't think it's worth worrying about for now.
I'm still seeing the final jump in the generated assembly.
How much effort would it be to clean that up?
You mean the human-readable disassembly? It would be fragile, but maybe we could just blindly remove the last two lines. But honestly, I've avoided messing with that too much. It's a really finicky format, and we already perform some transformations that aren't fixed up there. So it really should be read as "what the compiler gave us", not "what we're jitting". |
Many of our JIT templates end with a jump to the next opcode. This is generally needed to ensure correct control flow... but when the jump is the very last instruction, it's entirely unnecessary. Instead, we can just continue execution into the next instruction.
This removes these jumps for the following platforms (performance results):
x86_64-pc-windows-msvc
(2% faster)i686-pc-windows-msvc
(3% slower)x86_64-apple-darwin
(no benchmarks)aarch64-unknown-linux-gnu
(no benchmarks)x86_64-unknown-linux-gnu
(2% faster)(
aarch64-apple-darwin
needs a bit more work, since the final jump is still made up of 3 instructions. That can come next.)I'm assuming that the 32-bit Windows result is a fluke. If it's indeed the case that leaving the zero-length jumps in is faster, then we could easily remove this later by commenting out that case in
remove_jump
. If so, maybe it's an alignment thing and we should just pad withnop
s instead.Also, this will probably be even more effective once we have hot/cold splitting of some kind. Most of the templates end with error handling or deoptimization code, and still have to jump over it to get to the next instruction.