-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: redundant moves and stack variables when function using bits.Add64 is inlined #33349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The symptom is superficially similar to #32969. Any idea whether the root cause is the same? (CC @randall77 @dr2chase) |
I can't reproduce. I get this when compiling Zig with tip (
|
Yes, that's Zig()'s code. But when it is inlined into AppendZig() causes the problem. Apologies didn't make it clear. |
Ok, that makes more sense. Looks like we need a tweak to the schedule pass. Schedule produces:
There's just a single ADDQ, and we use Select0/Select1 to get the 64-bit result and the carry, respectively. But then the carry isn't used until after the memmove. That means we need to spill the carry and restore it around the memmove call. And because there's no instruction to spill/restore flags (no cheap one, anyway), we reissue the flag-generating instruction after the memmove. After flagalloc, we have:
v42 is now dead, we've recomputed it as v6+v15. We already try to schedule flag users earlier, to make flag lifetimes short. Just pushing ReadFlags ahead of Memory (which is what memmove is) seems to fix this issue. I'll have to think a bit about whether that would break anything else. |
Just prioritizing the flags op is not enough, as it leads to another bad ordering. With that change we do just a single add, but end up spilling both the add result and the result of the SBBQ to the stack before the memmove. Those values are then loaded and xored immediately after the memmove. It would be better to do the xor before spilling so there is only one thing to spill+restore. This will probably require a more general upgrade to the scheduling pass. We need to incorporate information to enable prioritization of instructions which kill more arguments than they generate results (or perhaps even enable subsequent instructions to do so). Right now it is done just by instruction class, which isn't precise enough. |
I did manage to find a similar issue with bits.Len(), as it also relies on flags.
The assembly starts with performing x|1 and a BSR, then goes initialises t, so has to perform another BSRQ to finally compute n.
|
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
https://play.golang.org/p/PWeQpDU4ilg
What did you expect to see?
No allocation of Zig()'s y variable on the stack, and no code duplication. Something along the lines of...
What did you see instead?
Execution of the Zig()'s bits.Add64()/ADDQ twice, with unnecessary use of temporary variable y.
The text was updated successfully, but these errors were encountered: