-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Missed optimization for -(foo < 0)
in Clang when foo
was previously sign-extended and in LLVM for all cases.
#57381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also, of course, it'd result in smaller WebAssembly output as well: https://godbolt.org/z/sfvnoxG8q (C) https://godbolt.org/z/qPr5Pf5Pf (LLVM) |
The shift+*ext patterns were chosen as canonical in IR near the dawn of LLVM. Someday, we should see if changing that to "*ext (x < 0)" (icmp+ext) makes more sense after all these years. Barring that, it looks like a small patch to InstCombine::Negator() would catch the simple case (and that should unlock the larger optimizations). |
0 - (zext (i8 X u>> 7) to iN) --> sext (i8 X s>> 7) to iN https://alive2.llvm.org/ce/z/jzv4Ud This is part of solving issue #57381.
I was too optimistic in thinking 1 patch would trigger the other transforms that we want. I haven't looked through to codegen yet, but this is hopefully an improvement: |
https://alive2.llvm.org/ce/z/rcbZmx Sibling tranform to 275aa24 This pattern is seen in the examples in issue #57381.
I think we have the IR in shape after that last commit. Can you confirm that the assembly is what you expect for all targets/examples? |
Another pair of related patches: There's at least one more lshr-of-signbit transform we could do, but it requires changing a conflicting fold to avoid an infinite compiler loop. AFAICT, we now have the ideal IR and codegen on the examples shown here, so I'm going to close the report. Feel free to re-open or file another bug if there are still problems. Thanks for the detailed analysis and excellent code examples! |
Uh oh!
There was an error while loading. Please reload this page.
C source: https://godbolt.org/z/aMTaGdTof (x86-64), https://godbolt.org/z/PM4hbPd3d (32-bit ARM), https://godbolt.org/z/dbaaYaG8n (AArch64)
LLVM output from above (with metadata stripped, generated from the C source): https://godbolt.org/z/6zG5dW4o1 (x86-64)
It's always safe to convert
negate(if (foo < 0) then 1 else 0)
tosigned_shift_right(foo, (bit_width of foo) - 1)
for any two's complement signed integerfoo
, regardless of width, and Clang already leverages this absent sign-extending: https://godbolt.org/z/ad3sxchKd (x86-64), https://godbolt.org/z/qTja8364h (32-bit ARM), https://godbolt.org/z/f74c8d1Y9 (AArch64). LLVM doesn't at all per https://godbolt.org/z/97zh4TEav (hand-written, targeting x86-64), and Clang consistently fails to make this optimization when the operand is previously sign-extended.(It's odd this is implemented in Clang and not LLVM, as it's a trivial peephole optimization that could feasibly be done even in an SSA model.)
Here's the C source from the above link for quick reference (Edit: remove redundant headers from here specifically):
All the transforms seem to be valid according to alive:
sub_load_mask_*
: https://alive2.llvm.org/ce/z/isU8PXsub_imm_mask_*
: https://alive2.llvm.org/ce/z/FjWinxsub_reg_mask_*
: https://alive2.llvm.org/ce/z/uSgSMVsub_simple_mask_*
: https://alive2.llvm.org/ce/z/XFxVebsimple_mask_*
: https://alive2.llvm.org/ce/z/jzv4UdAlso worth noting that ARM (especially its 32-bit variant) would benefit from this noticeably more than x86-64, as the links detail, thanks to its barrel shifter.
sub_load_mask_*
: 7 instructions to 3 instructions (ldrsh
followed by thesub_imm_mask_*
equivalent)sub_imm_mask_*
: 7 instructions to justrsb rT, rA, #imm
+and rD, rT, rB asr #31
sub_reg_mask_*
: 7 instructions to justsub rT, rA, rB asr #15
+and rD, rT, rB asr #31
sub_simple_mask_*
: 6 instructions to justand rD, rA, rB asr #15
simple_mask_*
: 5 instructions to justasr rD, rA asr #15
sub_load_mask_*
: 7 instructions to 4 instructionssub_imm_mask_*
: 5 instructions to 4 instructionssub_reg_mask_*
: 4 instructions to 3 instructionssub_simple_mask_*
: 3 instructions to justand rT, rA, #0x8000
+neg rT, rA asr #15
simple_mask_*
: 2 instructions to justsbfx rD, rA, #15, #1
The text was updated successfully, but these errors were encountered: