-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Improve IR for code which finds position of highest bit #43471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
define dso_local i32 @_Z13FIO_highbit64y(i64 %0) local_unnamed_addr #0 {
%2 = tail call i64 @llvm.ctlz.i64(i64 %0, i1 true), !range !2
%3 = trunc i64 %2 to i32
%4 = xor i32 %3, 63
ret i32 %4
} |
Isn't the transformed version more poisonous than the original version define dso_local i64 @_Z13FIO_highbit64y(i64 %0) local_unnamed_addr #0 {
%2 = lshr i64 %0, 1
%3 = call i64 @llvm.ctlz.i64(i64 %2, i1 false), !range !2
%4 = sub nuw nsw i64 64, %3
ret i64 %4
}
=>
define dso_local i64 @_Z14FIO_highbit64ay(i64 %0) local_unnamed_addr #1 {
%2 = tail call i64 @llvm.ctlz.i64(i64 %0, i1 true), !range !3
%3 = xor i64 %2, 63
ret i64 %3
} If %0 is 0 the original code has a defined answer. The transformed code produces poison. |
Right. But still, even with zero check, this code: FIO_highbit64(unsigned long long):
test rdi, rdi
je .LBB0_1
bsr rax, rdi
ret
.LBB0_1:
xor eax, eax
ret is better then current code. But with -march=haswell, situation is more interesting: FIO_highbit64_clz(unsigned long long):
lzcnt rcx, rdi
xor ecx, 63
xor eax, eax
test rdi, rdi
cmovne eax, ecx
ret
FIO_highbit64_loop(unsigned long long): // better
lzcnt rcx, rdi
mov eax, 64
sub eax, ecx
ret |
This might be yet another case where we should consider doing this. bool X86TargetLowering::isCheapToSpeculateCttz() const {
// Speculate cttz only if we can directly use TZCNT.
- return Subtarget.hasBMI();
+ return Subtarget.hasBMI() || Subtarget.hasCMov();
}
bool X86TargetLowering::isCheapToSpeculateCtlz() const {
// Speculate ctlz only if we can directly use LZCNT.
- return Subtarget.hasLZCNT();
+ return Subtarget.hasLZCNT() || Subtarget.hasCMov();
} |
@llvm/issue-subscribers-backend-x86 Author: Dávid Bolvanský (davidbolvansky)
| | |
| --- | --- |
| Bugzilla Link | [44126](https://llvm.org/bz44126) |
| Version | trunk |
| OS | Linux |
| CC | @topperc,@hfinkel,@LebedevRI,@rotateright |
Extended Descriptionunsigned long long FIO_highbit64(unsigned long long v)
{
unsigned count = 0;
v >>= 1;
while (v) { v >>= 1; count++; }
return count;
} should be same as: unsigned long long FIO_highbit64a(unsigned long long v)
{
return 63 - __builtin_clzll(v);
} But first version has worse IR and codegen: define dso_local i64 @<!-- -->_Z13FIO_highbit64y(i64 %0) local_unnamed_addr #<!-- -->0 {
%2 = lshr i64 %0, 1
%3 = call i64 @<!-- -->llvm.ctlz.i64(i64 %2, i1 false), !range !2
%4 = sub nuw nsw i64 64, %3
ret i64 %4
}
=>
define dso_local i64 @<!-- -->_Z14FIO_highbit64ay(i64 %0) local_unnamed_addr #<!-- -->1 {
%2 = tail call i64 @<!-- -->llvm.ctlz.i64(i64 %0, i1 true), !range !3
%3 = xor i64 %2, 63
ret i64 %3
} It would be good to not forget on trunc variant: unsigned FIO_highbit64(unsigned long long v)
{
unsigned count = 0;
v >>= 1;
while (v) { v >>= 1; count++; }
return count;
} define dso_local i32 @<!-- -->_Z13FIO_highbit64y(i64 %0) local_unnamed_addr #<!-- -->0 {
%2 = lshr i64 %0, 1
%3 = call i64 @<!-- -->llvm.ctlz.i64(i64 %2, i1 false), !range !2
%4 = trunc i64 %3 to i32
%5 = sub nsw i32 64, %4
ret i32 %5
}
=>
define dso_local i32 @<!-- -->_Z13FIO_highbit64y(i64 %0) local_unnamed_addr #<!-- -->0 {
%2 = tail call i64 @<!-- -->llvm.cttz.i64(i64 %0, i1 true), !range !2
%3 = trunc i64 %2 to i32
%4 = xor i32 %3, 63
ret i32 %4
} FIO_highbit64(unsigned long long):
shr rdi
je .LBB0_1
bsr rcx, rdi
xor rcx, 63
mov eax, 64
sub eax, ecx
ret
.LBB0_1:
mov ecx, 64
mov eax, 64
sub eax, ecx
ret vs: FIO_highbit64(unsigned long long):
bsf rax, rdi
xor eax, 63
ret |
Extended Description
https://github.com/facebook/zstd/blob/47034cd6c31125fdba3155abe9a618f580b4f3eb/programs/fileio.c#L1789
should be same as:
But first version has worse IR and codegen:
It would be good to not forget on trunc variant:
vs:
The text was updated successfully, but these errors were encountered: