-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[RISCV] Transform fcmp to is.fpclass #120242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The `instcombine` pass transforms some `is.fpclass` intrinsics into `fcpm` calls. If a given floating point extension is not available (F/D/Zfinx/Zfinx/Zfh/Zfhmin), these `fcmp` calls are lowered to libcalls. In these cases, custom lowering of the `is.fpclass` intrinsics in the back-end generates more efficient code. In the `riscv-codegenprepare` pass, these `fcmp` calls are converted back to `is.fpclass` intrinsics.
@llvm/pr-subscribers-backend-risc-v Author: Gergely Futo (futog) ChangesThe In these cases, custom lowering of the In the Patch is 51.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/120242.diff 4 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp b/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp
index 5be5345cca73a9..9bee2ff2590774 100644
--- a/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp
+++ b/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp
@@ -20,11 +20,13 @@
#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/Dominators.h"
#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/InstVisitor.h"
#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/PatternMatch.h"
#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"
+#include "llvm/Transforms/Utils/Local.h"
using namespace llvm;
@@ -58,6 +60,7 @@ class RISCVCodeGenPrepare : public FunctionPass,
bool visitAnd(BinaryOperator &BO);
bool visitIntrinsicInst(IntrinsicInst &I);
bool expandVPStrideLoad(IntrinsicInst &I);
+ bool visitFCmpInst(FCmpInst &I);
};
} // end anonymous namespace
@@ -196,6 +199,42 @@ bool RISCVCodeGenPrepare::expandVPStrideLoad(IntrinsicInst &II) {
return true;
}
+// The 'fcmp uno/ord/oeq/une/ueq/one/ogt/oge/olt/ole x, 0.0' instructions are
+// equivalent to an FP class test. If the fcmp instruction would be custom
+// lowered or lowered to a libcall, use the is.fpclass intrinsic instead, which
+// is lowered by the back-end without a libcall.
+//
+// This basically reverts the transformations of
+// InstCombinerImpl::foldIntrinsicIsFPClass.
+bool RISCVCodeGenPrepare::visitFCmpInst(FCmpInst &Fcmp) {
+ const auto *TLI = ST->getTargetLowering();
+ const EVT VT = TLI->getValueType(*DL, Fcmp.getOperand(0)->getType());
+ const int ISDOpcode = TLI->InstructionOpcodeToISD(Fcmp.getOpcode());
+
+ auto LegalizeTypeAction = TLI->getTypeAction(Fcmp.getContext(), VT);
+ auto OperationAction = TLI->getOperationAction(ISDOpcode, VT);
+ if ((LegalizeTypeAction != TargetLoweringBase::TypeSoftenFloat &&
+ LegalizeTypeAction != TargetLoweringBase::TypeSoftPromoteHalf) ||
+ OperationAction == TargetLowering::Custom)
+ return false;
+
+ auto [ClassVal, ClassTest] =
+ fcmpToClassTest(Fcmp.getPredicate(), *Fcmp.getParent()->getParent(),
+ Fcmp.getOperand(0), Fcmp.getOperand(1));
+
+ // FIXME: For some conditions (e.g ole, olt, oge, ogt) the output is quite
+ // verbose compared to the libcall. Should we do the tranformation
+ // only if we are optimizing for speed?
+ if (!ClassVal)
+ return false;
+
+ IRBuilder<> Builder(&Fcmp);
+ Value *IsFPClass = Builder.createIsFPClass(ClassVal, ClassTest);
+ Fcmp.replaceAllUsesWith(IsFPClass);
+ RecursivelyDeleteTriviallyDeadInstructions(&Fcmp);
+ return true;
+}
+
bool RISCVCodeGenPrepare::runOnFunction(Function &F) {
if (skipFunction(F))
return false;
diff --git a/llvm/test/CodeGen/RISCV/is-fpclass-f32.ll b/llvm/test/CodeGen/RISCV/is-fpclass-f32.ll
new file mode 100644
index 00000000000000..dd918ba7a1a8d7
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/is-fpclass-f32.ll
@@ -0,0 +1,1097 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv32 -mattr=+f -verify-machineinstrs -target-abi=ilp32f < %s \
+; RUN: | FileCheck -check-prefix=RV32IF %s
+; RUN: llc -mtriple=riscv32 -mattr=+zfinx -verify-machineinstrs -target-abi=ilp32 < %s \
+; RUN: | FileCheck -check-prefix=RV32IZFINX %s
+; RUN: llc -mtriple=riscv32 -mattr=+d -verify-machineinstrs -target-abi=ilp32f < %s \
+; RUN: | FileCheck -check-prefix=RV32IF %s
+; RUN: llc -mtriple=riscv64 -mattr=+f -verify-machineinstrs -target-abi=lp64f < %s \
+; RUN: | FileCheck -check-prefix=RV64IF %s
+; RUN: llc -mtriple=riscv64 -mattr=+zfinx -verify-machineinstrs -target-abi=lp64 < %s \
+; RUN: | FileCheck -check-prefix=RV64IZFINX %s
+; RUN: llc -mtriple=riscv64 -mattr=+d -verify-machineinstrs -target-abi=lp64d < %s \
+; RUN: | FileCheck -check-prefix=RV64IF %s
+; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefix=RV32I %s
+; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefix=RV64I %s
+
+declare i1 @llvm.is.fpclass.f32(float, i32)
+
+define i1 @fpclass(float %x) {
+; RV32IF-LABEL: fpclass:
+; RV32IF: # %bb.0:
+; RV32IF-NEXT: fclass.s a0, fa0
+; RV32IF-NEXT: andi a0, a0, 927
+; RV32IF-NEXT: snez a0, a0
+; RV32IF-NEXT: ret
+;
+; RV32IZFINX-LABEL: fpclass:
+; RV32IZFINX: # %bb.0:
+; RV32IZFINX-NEXT: fclass.s a0, a0
+; RV32IZFINX-NEXT: andi a0, a0, 927
+; RV32IZFINX-NEXT: snez a0, a0
+; RV32IZFINX-NEXT: ret
+;
+; RV64IF-LABEL: fpclass:
+; RV64IF: # %bb.0:
+; RV64IF-NEXT: fclass.s a0, fa0
+; RV64IF-NEXT: andi a0, a0, 927
+; RV64IF-NEXT: snez a0, a0
+; RV64IF-NEXT: ret
+;
+; RV64IZFINX-LABEL: fpclass:
+; RV64IZFINX: # %bb.0:
+; RV64IZFINX-NEXT: fclass.s a0, a0
+; RV64IZFINX-NEXT: andi a0, a0, 927
+; RV64IZFINX-NEXT: snez a0, a0
+; RV64IZFINX-NEXT: ret
+;
+; RV32I-LABEL: fpclass:
+; RV32I: # %bb.0:
+; RV32I-NEXT: slli a1, a0, 1
+; RV32I-NEXT: lui a2, 2048
+; RV32I-NEXT: slti a0, a0, 0
+; RV32I-NEXT: lui a3, 522240
+; RV32I-NEXT: lui a4, 1046528
+; RV32I-NEXT: srli a1, a1, 1
+; RV32I-NEXT: addi a2, a2, -1
+; RV32I-NEXT: addi a5, a1, -1
+; RV32I-NEXT: sltu a2, a5, a2
+; RV32I-NEXT: xor a5, a1, a3
+; RV32I-NEXT: slt a3, a3, a1
+; RV32I-NEXT: add a4, a1, a4
+; RV32I-NEXT: seqz a1, a1
+; RV32I-NEXT: seqz a5, a5
+; RV32I-NEXT: srli a4, a4, 24
+; RV32I-NEXT: and a2, a2, a0
+; RV32I-NEXT: or a1, a1, a5
+; RV32I-NEXT: sltiu a4, a4, 127
+; RV32I-NEXT: or a1, a1, a2
+; RV32I-NEXT: or a1, a1, a3
+; RV32I-NEXT: and a0, a4, a0
+; RV32I-NEXT: or a0, a1, a0
+; RV32I-NEXT: ret
+;
+; RV64I-LABEL: fpclass:
+; RV64I: # %bb.0:
+; RV64I-NEXT: sext.w a1, a0
+; RV64I-NEXT: slli a0, a0, 33
+; RV64I-NEXT: lui a2, 2048
+; RV64I-NEXT: lui a3, 522240
+; RV64I-NEXT: lui a4, 1046528
+; RV64I-NEXT: srli a0, a0, 33
+; RV64I-NEXT: addiw a2, a2, -1
+; RV64I-NEXT: slti a1, a1, 0
+; RV64I-NEXT: addi a5, a0, -1
+; RV64I-NEXT: sltu a2, a5, a2
+; RV64I-NEXT: xor a5, a0, a3
+; RV64I-NEXT: slt a3, a3, a0
+; RV64I-NEXT: add a4, a0, a4
+; RV64I-NEXT: seqz a0, a0
+; RV64I-NEXT: seqz a5, a5
+; RV64I-NEXT: srliw a4, a4, 24
+; RV64I-NEXT: and a2, a2, a1
+; RV64I-NEXT: or a0, a0, a5
+; RV64I-NEXT: sltiu a4, a4, 127
+; RV64I-NEXT: or a0, a0, a2
+; RV64I-NEXT: or a0, a0, a3
+; RV64I-NEXT: and a1, a4, a1
+; RV64I-NEXT: or a0, a0, a1
+; RV64I-NEXT: ret
+ %cmp = call i1 @llvm.is.fpclass.f32(float %x, i32 639)
+ ret i1 %cmp
+}
+
+define i1 @is_nan_fpclass(float %x) {
+; RV32IF-LABEL: is_nan_fpclass:
+; RV32IF: # %bb.0:
+; RV32IF-NEXT: fclass.s a0, fa0
+; RV32IF-NEXT: andi a0, a0, 768
+; RV32IF-NEXT: snez a0, a0
+; RV32IF-NEXT: ret
+;
+; RV32IZFINX-LABEL: is_nan_fpclass:
+; RV32IZFINX: # %bb.0:
+; RV32IZFINX-NEXT: fclass.s a0, a0
+; RV32IZFINX-NEXT: andi a0, a0, 768
+; RV32IZFINX-NEXT: snez a0, a0
+; RV32IZFINX-NEXT: ret
+;
+; RV64IF-LABEL: is_nan_fpclass:
+; RV64IF: # %bb.0:
+; RV64IF-NEXT: fclass.s a0, fa0
+; RV64IF-NEXT: andi a0, a0, 768
+; RV64IF-NEXT: snez a0, a0
+; RV64IF-NEXT: ret
+;
+; RV64IZFINX-LABEL: is_nan_fpclass:
+; RV64IZFINX: # %bb.0:
+; RV64IZFINX-NEXT: fclass.s a0, a0
+; RV64IZFINX-NEXT: andi a0, a0, 768
+; RV64IZFINX-NEXT: snez a0, a0
+; RV64IZFINX-NEXT: ret
+;
+; RV32I-LABEL: is_nan_fpclass:
+; RV32I: # %bb.0:
+; RV32I-NEXT: slli a0, a0, 1
+; RV32I-NEXT: srli a0, a0, 1
+; RV32I-NEXT: lui a1, 522240
+; RV32I-NEXT: slt a0, a1, a0
+; RV32I-NEXT: ret
+;
+; RV64I-LABEL: is_nan_fpclass:
+; RV64I: # %bb.0:
+; RV64I-NEXT: slli a0, a0, 33
+; RV64I-NEXT: srli a0, a0, 33
+; RV64I-NEXT: lui a1, 522240
+; RV64I-NEXT: slt a0, a1, a0
+; RV64I-NEXT: ret
+ %1 = call i1 @llvm.is.fpclass.f32(float %x, i32 3) ; nan
+ ret i1 %1
+}
+
+define i1 @is_qnan_fpclass(float %x) {
+; RV32IF-LABEL: is_qnan_fpclass:
+; RV32IF: # %bb.0:
+; RV32IF-NEXT: fclass.s a0, fa0
+; RV32IF-NEXT: srli a0, a0, 9
+; RV32IF-NEXT: ret
+;
+; RV32IZFINX-LABEL: is_qnan_fpclass:
+; RV32IZFINX: # %bb.0:
+; RV32IZFINX-NEXT: fclass.s a0, a0
+; RV32IZFINX-NEXT: srli a0, a0, 9
+; RV32IZFINX-NEXT: ret
+;
+; RV64IF-LABEL: is_qnan_fpclass:
+; RV64IF: # %bb.0:
+; RV64IF-NEXT: fclass.s a0, fa0
+; RV64IF-NEXT: srli a0, a0, 9
+; RV64IF-NEXT: ret
+;
+; RV64IZFINX-LABEL: is_qnan_fpclass:
+; RV64IZFINX: # %bb.0:
+; RV64IZFINX-NEXT: fclass.s a0, a0
+; RV64IZFINX-NEXT: srli a0, a0, 9
+; RV64IZFINX-NEXT: ret
+;
+; RV32I-LABEL: is_qnan_fpclass:
+; RV32I: # %bb.0:
+; RV32I-NEXT: slli a0, a0, 1
+; RV32I-NEXT: lui a1, 523264
+; RV32I-NEXT: srli a0, a0, 1
+; RV32I-NEXT: addi a1, a1, -1
+; RV32I-NEXT: slt a0, a1, a0
+; RV32I-NEXT: ret
+;
+; RV64I-LABEL: is_qnan_fpclass:
+; RV64I: # %bb.0:
+; RV64I-NEXT: slli a0, a0, 33
+; RV64I-NEXT: lui a1, 523264
+; RV64I-NEXT: srli a0, a0, 33
+; RV64I-NEXT: addiw a1, a1, -1
+; RV64I-NEXT: slt a0, a1, a0
+; RV64I-NEXT: ret
+ %1 = call i1 @llvm.is.fpclass.f32(float %x, i32 2) ; qnan
+ ret i1 %1
+}
+
+define i1 @is_snan_fpclass(float %x) {
+; RV32IF-LABEL: is_snan_fpclass:
+; RV32IF: # %bb.0:
+; RV32IF-NEXT: fclass.s a0, fa0
+; RV32IF-NEXT: slli a0, a0, 23
+; RV32IF-NEXT: srli a0, a0, 31
+; RV32IF-NEXT: ret
+;
+; RV32IZFINX-LABEL: is_snan_fpclass:
+; RV32IZFINX: # %bb.0:
+; RV32IZFINX-NEXT: fclass.s a0, a0
+; RV32IZFINX-NEXT: slli a0, a0, 23
+; RV32IZFINX-NEXT: srli a0, a0, 31
+; RV32IZFINX-NEXT: ret
+;
+; RV64IF-LABEL: is_snan_fpclass:
+; RV64IF: # %bb.0:
+; RV64IF-NEXT: fclass.s a0, fa0
+; RV64IF-NEXT: slli a0, a0, 55
+; RV64IF-NEXT: srli a0, a0, 63
+; RV64IF-NEXT: ret
+;
+; RV64IZFINX-LABEL: is_snan_fpclass:
+; RV64IZFINX: # %bb.0:
+; RV64IZFINX-NEXT: fclass.s a0, a0
+; RV64IZFINX-NEXT: slli a0, a0, 55
+; RV64IZFINX-NEXT: srli a0, a0, 63
+; RV64IZFINX-NEXT: ret
+;
+; RV32I-LABEL: is_snan_fpclass:
+; RV32I: # %bb.0:
+; RV32I-NEXT: slli a0, a0, 1
+; RV32I-NEXT: lui a1, 523264
+; RV32I-NEXT: lui a2, 522240
+; RV32I-NEXT: srli a0, a0, 1
+; RV32I-NEXT: slt a1, a0, a1
+; RV32I-NEXT: slt a0, a2, a0
+; RV32I-NEXT: and a0, a0, a1
+; RV32I-NEXT: ret
+;
+; RV64I-LABEL: is_snan_fpclass:
+; RV64I: # %bb.0:
+; RV64I-NEXT: slli a0, a0, 33
+; RV64I-NEXT: lui a1, 523264
+; RV64I-NEXT: lui a2, 522240
+; RV64I-NEXT: srli a0, a0, 33
+; RV64I-NEXT: slt a1, a0, a1
+; RV64I-NEXT: slt a0, a2, a0
+; RV64I-NEXT: and a0, a0, a1
+; RV64I-NEXT: ret
+ %1 = call i1 @llvm.is.fpclass.f32(float %x, i32 1) ; snan
+ ret i1 %1
+}
+
+define i1 @is_inf_fpclass(float %x) {
+; RV32IF-LABEL: is_inf_fpclass:
+; RV32IF: # %bb.0:
+; RV32IF-NEXT: fclass.s a0, fa0
+; RV32IF-NEXT: andi a0, a0, 129
+; RV32IF-NEXT: snez a0, a0
+; RV32IF-NEXT: ret
+;
+; RV32IZFINX-LABEL: is_inf_fpclass:
+; RV32IZFINX: # %bb.0:
+; RV32IZFINX-NEXT: fclass.s a0, a0
+; RV32IZFINX-NEXT: andi a0, a0, 129
+; RV32IZFINX-NEXT: snez a0, a0
+; RV32IZFINX-NEXT: ret
+;
+; RV64IF-LABEL: is_inf_fpclass:
+; RV64IF: # %bb.0:
+; RV64IF-NEXT: fclass.s a0, fa0
+; RV64IF-NEXT: andi a0, a0, 129
+; RV64IF-NEXT: snez a0, a0
+; RV64IF-NEXT: ret
+;
+; RV64IZFINX-LABEL: is_inf_fpclass:
+; RV64IZFINX: # %bb.0:
+; RV64IZFINX-NEXT: fclass.s a0, a0
+; RV64IZFINX-NEXT: andi a0, a0, 129
+; RV64IZFINX-NEXT: snez a0, a0
+; RV64IZFINX-NEXT: ret
+;
+; RV32I-LABEL: is_inf_fpclass:
+; RV32I: # %bb.0:
+; RV32I-NEXT: slli a0, a0, 1
+; RV32I-NEXT: srli a0, a0, 1
+; RV32I-NEXT: lui a1, 522240
+; RV32I-NEXT: xor a0, a0, a1
+; RV32I-NEXT: seqz a0, a0
+; RV32I-NEXT: ret
+;
+; RV64I-LABEL: is_inf_fpclass:
+; RV64I: # %bb.0:
+; RV64I-NEXT: slli a0, a0, 33
+; RV64I-NEXT: srli a0, a0, 33
+; RV64I-NEXT: lui a1, 522240
+; RV64I-NEXT: xor a0, a0, a1
+; RV64I-NEXT: seqz a0, a0
+; RV64I-NEXT: ret
+ %1 = call i1 @llvm.is.fpclass.f32(float %x, i32 516) ; 0x204 = "inf"
+ ret i1 %1
+}
+
+define i1 @is_posinf_fpclass(float %x) {
+; RV32IF-LABEL: is_posinf_fpclass:
+; RV32IF: # %bb.0:
+; RV32IF-NEXT: fclass.s a0, fa0
+; RV32IF-NEXT: slli a0, a0, 24
+; RV32IF-NEXT: srli a0, a0, 31
+; RV32IF-NEXT: ret
+;
+; RV32IZFINX-LABEL: is_posinf_fpclass:
+; RV32IZFINX: # %bb.0:
+; RV32IZFINX-NEXT: fclass.s a0, a0
+; RV32IZFINX-NEXT: slli a0, a0, 24
+; RV32IZFINX-NEXT: srli a0, a0, 31
+; RV32IZFINX-NEXT: ret
+;
+; RV64IF-LABEL: is_posinf_fpclass:
+; RV64IF: # %bb.0:
+; RV64IF-NEXT: fclass.s a0, fa0
+; RV64IF-NEXT: slli a0, a0, 56
+; RV64IF-NEXT: srli a0, a0, 63
+; RV64IF-NEXT: ret
+;
+; RV64IZFINX-LABEL: is_posinf_fpclass:
+; RV64IZFINX: # %bb.0:
+; RV64IZFINX-NEXT: fclass.s a0, a0
+; RV64IZFINX-NEXT: slli a0, a0, 56
+; RV64IZFINX-NEXT: srli a0, a0, 63
+; RV64IZFINX-NEXT: ret
+;
+; RV32I-LABEL: is_posinf_fpclass:
+; RV32I: # %bb.0:
+; RV32I-NEXT: lui a1, 522240
+; RV32I-NEXT: xor a0, a0, a1
+; RV32I-NEXT: seqz a0, a0
+; RV32I-NEXT: ret
+;
+; RV64I-LABEL: is_posinf_fpclass:
+; RV64I: # %bb.0:
+; RV64I-NEXT: sext.w a0, a0
+; RV64I-NEXT: lui a1, 522240
+; RV64I-NEXT: xor a0, a0, a1
+; RV64I-NEXT: seqz a0, a0
+; RV64I-NEXT: ret
+ %1 = call i1 @llvm.is.fpclass.f32(float %x, i32 512) ; 0x200 = "+inf"
+ ret i1 %1
+}
+
+define i1 @is_neginf_fpclass(float %x) {
+; RV32IF-LABEL: is_neginf_fpclass:
+; RV32IF: # %bb.0:
+; RV32IF-NEXT: fclass.s a0, fa0
+; RV32IF-NEXT: andi a0, a0, 1
+; RV32IF-NEXT: ret
+;
+; RV32IZFINX-LABEL: is_neginf_fpclass:
+; RV32IZFINX: # %bb.0:
+; RV32IZFINX-NEXT: fclass.s a0, a0
+; RV32IZFINX-NEXT: andi a0, a0, 1
+; RV32IZFINX-NEXT: ret
+;
+; RV64IF-LABEL: is_neginf_fpclass:
+; RV64IF: # %bb.0:
+; RV64IF-NEXT: fclass.s a0, fa0
+; RV64IF-NEXT: andi a0, a0, 1
+; RV64IF-NEXT: ret
+;
+; RV64IZFINX-LABEL: is_neginf_fpclass:
+; RV64IZFINX: # %bb.0:
+; RV64IZFINX-NEXT: fclass.s a0, a0
+; RV64IZFINX-NEXT: andi a0, a0, 1
+; RV64IZFINX-NEXT: ret
+;
+; RV32I-LABEL: is_neginf_fpclass:
+; RV32I: # %bb.0:
+; RV32I-NEXT: lui a1, 1046528
+; RV32I-NEXT: xor a0, a0, a1
+; RV32I-NEXT: seqz a0, a0
+; RV32I-NEXT: ret
+;
+; RV64I-LABEL: is_neginf_fpclass:
+; RV64I: # %bb.0:
+; RV64I-NEXT: sext.w a0, a0
+; RV64I-NEXT: lui a1, 1046528
+; RV64I-NEXT: xor a0, a0, a1
+; RV64I-NEXT: seqz a0, a0
+; RV64I-NEXT: ret
+ %1 = call i1 @llvm.is.fpclass.f32(float %x, i32 4) ; "-inf"
+ ret i1 %1
+}
+
+define i1 @is_finite_fpclass(float %x) {
+; RV32IF-LABEL: is_finite_fpclass:
+; RV32IF: # %bb.0:
+; RV32IF-NEXT: fclass.s a0, fa0
+; RV32IF-NEXT: andi a0, a0, 126
+; RV32IF-NEXT: snez a0, a0
+; RV32IF-NEXT: ret
+;
+; RV32IZFINX-LABEL: is_finite_fpclass:
+; RV32IZFINX: # %bb.0:
+; RV32IZFINX-NEXT: fclass.s a0, a0
+; RV32IZFINX-NEXT: andi a0, a0, 126
+; RV32IZFINX-NEXT: snez a0, a0
+; RV32IZFINX-NEXT: ret
+;
+; RV64IF-LABEL: is_finite_fpclass:
+; RV64IF: # %bb.0:
+; RV64IF-NEXT: fclass.s a0, fa0
+; RV64IF-NEXT: andi a0, a0, 126
+; RV64IF-NEXT: snez a0, a0
+; RV64IF-NEXT: ret
+;
+; RV64IZFINX-LABEL: is_finite_fpclass:
+; RV64IZFINX: # %bb.0:
+; RV64IZFINX-NEXT: fclass.s a0, a0
+; RV64IZFINX-NEXT: andi a0, a0, 126
+; RV64IZFINX-NEXT: snez a0, a0
+; RV64IZFINX-NEXT: ret
+;
+; RV32I-LABEL: is_finite_fpclass:
+; RV32I: # %bb.0:
+; RV32I-NEXT: slli a0, a0, 1
+; RV32I-NEXT: srli a0, a0, 1
+; RV32I-NEXT: lui a1, 522240
+; RV32I-NEXT: slt a0, a0, a1
+; RV32I-NEXT: ret
+;
+; RV64I-LABEL: is_finite_fpclass:
+; RV64I: # %bb.0:
+; RV64I-NEXT: slli a0, a0, 33
+; RV64I-NEXT: srli a0, a0, 33
+; RV64I-NEXT: lui a1, 522240
+; RV64I-NEXT: slt a0, a0, a1
+; RV64I-NEXT: ret
+ %1 = call i1 @llvm.is.fpclass.f32(float %x, i32 504) ; 0x1f8 = "finite"
+ ret i1 %1
+}
+
+define i1 @is_posfinite_fpclass(float %x) {
+; RV32IF-LABEL: is_posfinite_fpclass:
+; RV32IF: # %bb.0:
+; RV32IF-NEXT: fclass.s a0, fa0
+; RV32IF-NEXT: andi a0, a0, 112
+; RV32IF-NEXT: snez a0, a0
+; RV32IF-NEXT: ret
+;
+; RV32IZFINX-LABEL: is_posfinite_fpclass:
+; RV32IZFINX: # %bb.0:
+; RV32IZFINX-NEXT: fclass.s a0, a0
+; RV32IZFINX-NEXT: andi a0, a0, 112
+; RV32IZFINX-NEXT: snez a0, a0
+; RV32IZFINX-NEXT: ret
+;
+; RV64IF-LABEL: is_posfinite_fpclass:
+; RV64IF: # %bb.0:
+; RV64IF-NEXT: fclass.s a0, fa0
+; RV64IF-NEXT: andi a0, a0, 112
+; RV64IF-NEXT: snez a0, a0
+; RV64IF-NEXT: ret
+;
+; RV64IZFINX-LABEL: is_posfinite_fpclass:
+; RV64IZFINX: # %bb.0:
+; RV64IZFINX-NEXT: fclass.s a0, a0
+; RV64IZFINX-NEXT: andi a0, a0, 112
+; RV64IZFINX-NEXT: snez a0, a0
+; RV64IZFINX-NEXT: ret
+;
+; RV32I-LABEL: is_posfinite_fpclass:
+; RV32I: # %bb.0:
+; RV32I-NEXT: srli a0, a0, 23
+; RV32I-NEXT: sltiu a0, a0, 255
+; RV32I-NEXT: ret
+;
+; RV64I-LABEL: is_posfinite_fpclass:
+; RV64I: # %bb.0:
+; RV64I-NEXT: srliw a0, a0, 23
+; RV64I-NEXT: sltiu a0, a0, 255
+; RV64I-NEXT: ret
+ %1 = call i1 @llvm.is.fpclass.f32(float %x, i32 448) ; 0x1c0 = "+finite"
+ ret i1 %1
+}
+
+define i1 @is_negfinite_fpclass(float %x) {
+; RV32IF-LABEL: is_negfinite_fpclass:
+; RV32IF: # %bb.0:
+; RV32IF-NEXT: fclass.s a0, fa0
+; RV32IF-NEXT: andi a0, a0, 14
+; RV32IF-NEXT: snez a0, a0
+; RV32IF-NEXT: ret
+;
+; RV32IZFINX-LABEL: is_negfinite_fpclass:
+; RV32IZFINX: # %bb.0:
+; RV32IZFINX-NEXT: fclass.s a0, a0
+; RV32IZFINX-NEXT: andi a0, a0, 14
+; RV32IZFINX-NEXT: snez a0, a0
+; RV32IZFINX-NEXT: ret
+;
+; RV64IF-LABEL: is_negfinite_fpclass:
+; RV64IF: # %bb.0:
+; RV64IF-NEXT: fclass.s a0, fa0
+; RV64IF-NEXT: andi a0, a0, 14
+; RV64IF-NEXT: snez a0, a0
+; RV64IF-NEXT: ret
+;
+; RV64IZFINX-LABEL: is_negfinite_fpclass:
+; RV64IZFINX: # %bb.0:
+; RV64IZFINX-NEXT: fclass.s a0, a0
+; RV64IZFINX-NEXT: andi a0, a0, 14
+; RV64IZFINX-NEXT: snez a0, a0
+; RV64IZFINX-NEXT: ret
+;
+; RV32I-LABEL: is_negfinite_fpclass:
+; RV32I: # %bb.0:
+; RV32I-NEXT: slli a1, a0, 1
+; RV32I-NEXT: lui a2, 522240
+; RV32I-NEXT: srli a1, a1, 1
+; RV32I-NEXT: slt a1, a1, a2
+; RV32I-NEXT: slti a0, a0, 0
+; RV32I-NEXT: and a0, a1, a0
+; RV32I-NEXT: ret
+;
+; RV64I-LABEL: is_negfinite_fpclass:
+; RV64I: # %bb.0:
+; RV64I-NEXT: sext.w a1, a0
+; RV64I-NEXT: slli a0, a0, 33
+; RV64I-NEXT: lui a2, 522240
+; RV64I-NEXT: srli a0, a0, 33
+; RV64I-NEXT: slt a0, a0, a2
+; RV64I-NEXT: slti a1, a1, 0
+; RV64I-NEXT: and a0, a0, a1
+; RV64I-NEXT: ret
+ %1 = call i1 @llvm.is.fpclass.f32(float %x, i32 56) ; 0x38 = "-finite"
+ ret i1 %1
+}
+
+define i1 @is_notfinite_fpclass(float %x) {
+; RV32IF-LABEL: is_notfinite_fpclass:
+; RV32IF: # %bb.0:
+; RV32IF-NEXT: fclass.s a0, fa0
+; RV32IF-NEXT: andi a0, a0, 897
+; RV32IF-NEXT: snez a0, a0
+; RV32IF-NEXT: ret
+;
+; RV32IZFINX-LABEL: is_notfinite_fpclass:
+; RV32IZFINX: # %bb.0:
+; RV32IZFINX-NEXT: fclass.s a0, a0
+; RV32IZFINX-NEXT: andi a0, a0, 897
+; RV32IZFINX-NEXT: snez a0, a0
+; RV32IZFINX-NEXT: ret
+;
+; RV64IF-LABEL: is_notfinite_fpclass:
+; RV64IF: # %bb.0:
+; RV64IF-NEXT: fclass.s a0, fa0
+; RV64IF-NEXT: andi a0, a0, 897
+; RV64IF-NEXT: snez a0, a0
+; RV64IF-NEXT: ret
+;
+; RV64IZFINX-LABEL: is_notfinite_fpclass:
+; RV64IZFINX: # %bb.0:
+; RV64IZFINX-NEXT: fclass.s a0, a0
+; RV64IZFINX-NEXT: andi a0, a0, 897
+; RV64IZFINX-NEXT: snez a0, a0
+; RV64IZFINX-NEXT: ret
+;
+; RV32I-LABEL: is_notfinite_fpclass:
+; RV32I: # %bb.0:
+; RV32I-NEXT: slli a0, a0, 1
+; RV32I-NEXT: lui a1, 522240
+; RV32I-NEXT: srli a0, a0, 1
+; RV32I-NEXT: addi a1, a1, -1
+; RV32I-NEXT: slt a0, a1, a0
+; RV32I-NEXT: ret
+;
+; RV64I-LABEL: is_notfinite_fpclass:
+; RV64I: # %bb.0:
+; RV64I-NEXT: slli a0, a0, 33
+; RV64I-NEXT: lui a1, 522240
+; RV64I-NEXT: srli a0, a0, 33
+; RV64I-NEXT: addiw a1, a1, -1
+; RV64I-NEXT: slt a0, a1, a0
+; RV64I-NEXT: ret
+ %1 = call i1 @llvm.is.fpclass.f32(float %x, i32 519) ; ox207 = "inf|nan"
+ ret i1 %1
+}
+
+define ...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff 17b3dd03a05dfa938aacd57027189271a62e2fda 0dff7fbbad04e4d711c9beb84e227c81137f2c5d --extensions cpp -- llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp View the diff from clang-format here.diff --git a/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp b/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp
index 9bee2ff259..0416f315fe 100644
--- a/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp
+++ b/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp
@@ -20,8 +20,8 @@
#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/Dominators.h"
#include "llvm/IR/IRBuilder.h"
-#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/InstVisitor.h"
+#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/PatternMatch.h"
#include "llvm/InitializePasses.h"
|
// The 'fcmp uno/ord/oeq/une/ueq/one/ogt/oge/olt/ole x, 0.0' instructions are | ||
// equivalent to an FP class test. If the fcmp instruction would be custom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't true depending on denormal handling and fp exceptions
// | ||
// This basically reverts the transformations of | ||
// InstCombinerImpl::foldIntrinsicIsFPClass. | ||
bool RISCVCodeGenPrepare::visitFCmpInst(FCmpInst &Fcmp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeGenPrepare already has this transform
auto LegalizeTypeAction = TLI->getTypeAction(Fcmp.getContext(), VT); | ||
auto OperationAction = TLI->getOperationAction(ISDOpcode, VT); | ||
if ((LegalizeTypeAction != TargetLoweringBase::TypeSoftenFloat && | ||
LegalizeTypeAction != TargetLoweringBase::TypeSoftPromoteHalf) || | ||
OperationAction == TargetLowering::Custom) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This level of logic really belongs directly in the legalizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I understand. Moving it to the legalizer. @topperc is it ok to implement it there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the logic in CodeGenPrepare uses TargetLoweringBase::isFAbsFree
. When I started to implement this, I was wondering if there should be a similar function for FCmp
, and the whole thing should go into CodegenPrepare instead. Is this a valid approach? Or the right place to do it is in the legalizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like having it in codegenprepare in the first place. It really belongs in some combination of DAGCombiner or legalizer, depending on the purpose. The only nice thing is codegenprepare has access to better utilities, like an existing fcmpToClassTest helper and computeKnownFPClass. In principle those should be reimplemented in codegen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Regarding the InstCombinerImpl::foldIntrinsicIsFPClass
. For a back-end, where lowering the fcmp
is not cheap, why is it profitable to do this transformation? It is done unconditionally as far as I see it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fcmp is a better canonical form. More code will always understand fcmp than is.fpclass.
fcmp is not cheap
This is certainly not universally true, and I would say is not the common case. If the target wants something else, that's for the backend to undo for its preferred form.
FYI I did similar transformation in CodeGenPrepare: #81572. |
In CodeGenPrepare it is only for |
I mean you can handle your motivating case in CGP. |
The
instcombine
pass transforms someis.fpclass
intrinsics intofcpm
calls. If a given floating point extension is not available (F/D/Zfinx/Zfinx/Zfh/Zfhmin), thesefcmp
calls are lowered to libcalls.In these cases, custom lowering of the
is.fpclass
intrinsics in the back-end generates more efficient code.In the
riscv-codegenprepare
pass, thesefcmp
calls are converted back tois.fpclass
intrinsics.