-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[x86] suboptimal codegen for isfinite IR #27538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
On 2nd thought, no loads should be needed at all. The mask should be an immediate constant put into an int register via mov. |
Reminded by: Because this is x86, there's a 3rd and possibly 4th alternative that might be optimal for a given uarch:
Or try this in the vector integer domain:
Ie, there are many ways to get an FP comparison result over to an integer reg. |
Hi! This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:
For more instructions on how to submit a patch to LLVM, see our documentation. If you have any further questions about this issue, don't hesitate to ask via a comment on this Github issue. @llvm/issue-subscribers-good-first-issue |
@7flying is starting to work on this and I'm mentoring. |
define i1 @is_finite(float %x) {
%1 = tail call float @llvm.fabs.f32(float %x)
%2 = fcmp one float %1, 0x7FF0000000000000 ; ordered and not equal
ret i1 %2
}
declare float @llvm.fabs.f32(float) CodeGenPrepare now folds this to: define i1 @is_finite(float %x) {
%1 = call i1 @llvm.is.fpclass.f32(float %x, i32 504)
ret i1 %1
} resulting in: is_finite:
vmovd %xmm0, %eax
andl $2147483647, %eax
cmpl $2139095040, %eax
setl %al
retq |
Fixed by #81572 |
Uh oh!
There was an error while loading. Please reload this page.
Extended Description
Note: 2139095040 = 0x7f800000 (check if the exponent is maxed)
I think this can be reduced to "andnps" with that constant and then ucomiss against zero (save a load).
Alternatively, we could bring the FP value into an int register and do the bitwise comparison there. If we have BMI, it could be something like:
This only needs a scalar load and no explicit compare instruction is needed.
The text was updated successfully, but these errors were encountered: