-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Closed
Labels
llvm:instcombineCovers the InstCombine, InstSimplify and AggressiveInstCombine passesCovers the InstCombine, InstSimplify and AggressiveInstCombine passesmissed-optimization
Description
Compile:
// clang -std=c++20 -march=skylake -O2
#include <bit>
unsigned bit_ceil(unsigned X) {
return std::bit_ceil(X);
}
I get:
%sub.i.i = add i32 %X, -1
%0 = tail call i32 @llvm.ctlz.i32(i32 %sub.i.i, i1 false), !range !5
%sub2.i.i = sub nuw nsw i32 32, %0
%shl.i.i = shl nuw i32 1, %sub2.i.i
%or.cond.inv.i.i = icmp ugt i32 %X, 1
%retval.0.i.i = select i1 %or.cond.inv.i.i, i32 %shl.i.i, i32 1
8d 47 ff lea -0x1(%rdi),%eax
f3 0f bd c0 lzcnt %eax,%eax
f6 d8 neg %al
b9 01 00 00 00 mov $0x1,%ecx
c4 e2 79 f7 c1 shlx %eax,%ecx,%eax
83 ff 02 cmp $0x2,%edi
0f 42 c1 cmovb %ecx,%eax
We could drop cmp
and cmovb
like so:
8d 47 ff lea -0x1(%rdi),%eax
f3 0f bd c0 lzcnt %eax,%eax
f6 d8 neg %al
b9 01 00 00 00 mov $0x1,%ecx
c4 e2 79 f7 c1 shlx %eax,%ecx,%eax
This shorter sequence handles input 0 and 1 correctly:
input 0 1 2
-------------------
lea -1 0 1
lzcnt 0 32 31
neg 0 -32 -31
&0x1f 0 0 1
shlx 1 1 2
Note that shlx
masks the shift count with 0x1f
, which is something that the LLVM IR doesn't know, so this issue probably belongs to the x86 backend.
I've empirically verified the equivalence for all possible values of uint32_t
.
Metadata
Metadata
Assignees
Labels
llvm:instcombineCovers the InstCombine, InstSimplify and AggressiveInstCombine passesCovers the InstCombine, InstSimplify and AggressiveInstCombine passesmissed-optimization