Missed optimization for `-(foo < 0)` in Clang when `foo` was previously sign-extended and in LLVM for all cases. #57381

dead-claudia · 2022-08-26T08:03:39Z

C source: https://godbolt.org/z/aMTaGdTof (x86-64), https://godbolt.org/z/PM4hbPd3d (32-bit ARM), https://godbolt.org/z/dbaaYaG8n (AArch64)
LLVM output from above (with metadata stripped, generated from the C source): https://godbolt.org/z/6zG5dW4o1 (x86-64)

It's always safe to convert negate(if (foo < 0) then 1 else 0) to signed_shift_right(foo, (bit_width of foo) - 1) for any two's complement signed integer foo, regardless of width, and Clang already leverages this absent sign-extending: https://godbolt.org/z/ad3sxchKd (x86-64), https://godbolt.org/z/qTja8364h (32-bit ARM), https://godbolt.org/z/f74c8d1Y9 (AArch64). LLVM doesn't at all per https://godbolt.org/z/97zh4TEav (hand-written, targeting x86-64), and Clang consistently fails to make this optimization when the operand is previously sign-extended.

(It's odd this is implemented in Clang and not LLVM, as it's a trivial peephole optimization that could feasibly be done even in an SSA model.)

Here's the C source from the above link for quick reference (Edit: remove redundant headers from here specifically):

#define SOME_CONSTANT 1000

struct foo_t {
    short baz;
};

int sub_load_mask_1(struct foo_t *foo) {
    int baz = foo->baz;
    return (SOME_CONSTANT - baz) & -(baz < 0);
}

int sub_load_mask_2(struct foo_t *foo) {
    int baz = foo->baz;
    return (SOME_CONSTANT - baz) & (baz >> 31);
}

int sub_imm_mask_1(short baz) {
    return (SOME_CONSTANT - baz) & -(baz < 0);
}

int sub_imm_mask_2(short baz) {
    return (SOME_CONSTANT - baz) & (baz >> 31);
}

int sub_reg_mask_1(int k, short baz) {
    return (k - baz) & -(baz < 0);
}

int sub_reg_mask_2(int k, short baz) {
    return (k - baz) & ((int)baz >> 31);
}

int sub_simple_mask_1(int k, short baz) {
    return k & -(baz < 0);
}

int sub_simple_mask_2(int k, short baz) {
    return k & ((int)baz >> 31);
}

int simple_mask_1(int k, short baz) {
    return -(baz < 0);
}

int simple_mask_2(int k, short baz) {
    return ((int)baz >> 31);
}

All the transforms seem to be valid according to alive:

sub_load_mask_*: https://alive2.llvm.org/ce/z/isU8PX
sub_imm_mask_*: https://alive2.llvm.org/ce/z/FjWinx
sub_reg_mask_*: https://alive2.llvm.org/ce/z/uSgSMV
sub_simple_mask_*: https://alive2.llvm.org/ce/z/XFxVeb
simple_mask_*: https://alive2.llvm.org/ce/z/jzv4Ud

Also worth noting that ARM (especially its 32-bit variant) would benefit from this noticeably more than x86-64, as the links detail, thanks to its barrel shifter.

ARM 32-bit:
- sub_load_mask_*: 7 instructions to 3 instructions (ldrsh followed by the sub_imm_mask_* equivalent)
- sub_imm_mask_*: 7 instructions to just rsb rT, rA, #imm + and rD, rT, rB asr #31
- sub_reg_mask_*: 7 instructions to just sub rT, rA, rB asr #15 + and rD, rT, rB asr #31
- sub_simple_mask_*: 6 instructions to just and rD, rA, rB asr #15
- simple_mask_*: 5 instructions to just asr rD, rA asr #15
AArch64:
- sub_load_mask_*: 7 instructions to 4 instructions
- sub_imm_mask_*: 5 instructions to 4 instructions
- sub_reg_mask_*: 4 instructions to 3 instructions
- sub_simple_mask_*: 3 instructions to just and rT, rA, #0x8000 + neg rT, rA asr #15
- simple_mask_*: 2 instructions to just sbfx rD, rA, #15, #1

The text was updated successfully, but these errors were encountered:

dead-claudia · 2022-08-26T08:39:16Z

Also, of course, it'd result in smaller WebAssembly output as well: https://godbolt.org/z/sfvnoxG8q (C) https://godbolt.org/z/qPr5Pf5Pf (LLVM)

rotateright · 2022-08-26T19:43:49Z

The shift+*ext patterns were chosen as canonical in IR near the dawn of LLVM. Someday, we should see if changing that to "*ext (x < 0)" (icmp+ext) makes more sense after all these years. Barring that, it looks like a small patch to InstCombine::Negator() would catch the simple case (and that should unlock the larger optimizations).

issue #57381

0 - (zext (i8 X u>> 7) to iN) --> sext (i8 X s>> 7) to iN https://alive2.llvm.org/ce/z/jzv4Ud This is part of solving issue #57381.

rotateright · 2022-08-27T12:51:36Z

I was too optimistic in thinking 1 patch would trigger the other transforms that we want. I haven't looked through to codegen yet, but this is hopefully an improvement:
c6e5602

issue #57381

https://alive2.llvm.org/ce/z/rcbZmx Sibling tranform to 275aa24 This pattern is seen in the examples in issue #57381.

rotateright · 2022-08-28T16:11:59Z

I think we have the IR in shape after that last commit. Can you confirm that the assembly is what you expect for all targets/examples?

rotateright · 2022-08-31T12:34:30Z

Another pair of related patches:
246078604c87
6c39a3aae1dc

There's at least one more lshr-of-signbit transform we could do, but it requires changing a conflicting fold to avoid an infinite compiler loop.

AFAICT, we now have the ideal IR and codegen on the examples shown here, so I'm going to close the report.

Feel free to re-open or file another bug if there are still problems. Thanks for the detailed analysis and excellent code examples!

github-actions bot added the new issue label Aug 26, 2022

dead-claudia mentioned this issue Aug 26, 2022

Slightly suboptimal encoding of sign-extend -> subtract in AArch64 by LLVM #57382

Open

EugeneZelenko added llvm:optimizations missed-optimization and removed new issue labels Aug 26, 2022

rotateright self-assigned this Aug 26, 2022

rotateright mentioned this issue Aug 26, 2022

[InstCombine] fold bit-hack form of signbit test #57394

Closed

rotateright added a commit that referenced this issue Aug 27, 2022

[InstCombine] add tests for signbit-smear; NFC

3b071e1

issue #57381

rotateright added a commit that referenced this issue Aug 27, 2022

[InstCombine] fold signbit splat pattern that uses negate

c6e5602

0 - (zext (i8 X u>> 7) to iN) --> sext (i8 X s>> 7) to iN https://alive2.llvm.org/ce/z/jzv4Ud This is part of solving issue #57381.

EugeneZelenko added llvm:instcombine and removed llvm:optimizations labels Aug 27, 2022

rotateright added a commit that referenced this issue Aug 27, 2022

[InstCombine] add tests for signbit splat mask; NFC

3cde55d

issue #57381

rotateright added a commit that referenced this issue Aug 28, 2022

[InstCombine] allow sext in fold of mask using signbit, part 2

ab68929

https://alive2.llvm.org/ce/z/rcbZmx Sibling tranform to 275aa24 This pattern is seen in the examples in issue #57381.

rotateright closed this as completed Aug 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missed optimization for `-(foo < 0)` in Clang when `foo` was previously sign-extended and in LLVM for all cases. #57381

Missed optimization for `-(foo < 0)` in Clang when `foo` was previously sign-extended and in LLVM for all cases. #57381

dead-claudia commented Aug 26, 2022 •

edited

Loading

dead-claudia commented Aug 26, 2022

Uh oh!

rotateright commented Aug 26, 2022

Uh oh!

rotateright commented Aug 27, 2022

Uh oh!

rotateright commented Aug 28, 2022

Uh oh!

rotateright commented Aug 31, 2022

Uh oh!

Missed optimization for -(foo < 0) in Clang when foo was previously sign-extended and in LLVM for all cases. #57381

Missed optimization for -(foo < 0) in Clang when foo was previously sign-extended and in LLVM for all cases. #57381

Comments

dead-claudia commented Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

dead-claudia commented Aug 26, 2022

Uh oh!

rotateright commented Aug 26, 2022

Uh oh!

rotateright commented Aug 27, 2022

Uh oh!

rotateright commented Aug 28, 2022

Uh oh!

rotateright commented Aug 31, 2022

Uh oh!

Missed optimization for `-(foo < 0)` in Clang when `foo` was previously sign-extended and in LLVM for all cases. #57381

Missed optimization for `-(foo < 0)` in Clang when `foo` was previously sign-extended and in LLVM for all cases. #57381

dead-claudia commented Aug 26, 2022 •

edited

Loading