-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[ARM] Stop gluing FP comparisons to FMSTAT #116676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c37a1c4
to
d906394
Compare
It looks as if the Not a complaint – I don't want to be one of those code reviewers who won't accept a small improvement until it's turned into a much bigger one :-) but it might be a thing to call out in the commit message to avoid confusion. |
That's true, and I was going to fix it in the next patch. Unfortunately the patch will not be small, because changing this particular glue requires changing operands of BRCOND/CSINC/CSEL and all the others simultaneously. Otherwise, I'd have to insert copies and that would break DAG combiner logic.
Sure, there are a couple more things I would also like to point out. |
@llvm/pr-subscribers-backend-arm Author: Sergei Barannikov (s-barannikov) ChangesFollowing #116547, this changes the result of This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that This patch also sets There might be a case when a copy can't be avoided (although not found in existing tests). If a copy is necessary, the virtual register will be created with Patch is 363.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/116676.diff 17 Files Affected:
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 84b37ae6833aed..6b290135c5bcba 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -4971,14 +4971,14 @@ SDValue ARMTargetLowering::getVFPCmp(SDValue LHS, SDValue RHS,
SelectionDAG &DAG, const SDLoc &dl,
bool Signaling) const {
assert(Subtarget->hasFP64() || RHS.getValueType() != MVT::f64);
- SDValue Cmp;
+ SDValue Flags;
if (!isFloatingPointZero(RHS))
- Cmp = DAG.getNode(Signaling ? ARMISD::CMPFPE : ARMISD::CMPFP,
- dl, MVT::Glue, LHS, RHS);
+ Flags = DAG.getNode(Signaling ? ARMISD::CMPFPE : ARMISD::CMPFP, dl, FlagsVT,
+ LHS, RHS);
else
- Cmp = DAG.getNode(Signaling ? ARMISD::CMPFPEw0 : ARMISD::CMPFPw0,
- dl, MVT::Glue, LHS);
- return DAG.getNode(ARMISD::FMSTAT, dl, MVT::Glue, Cmp);
+ Flags = DAG.getNode(Signaling ? ARMISD::CMPFPEw0 : ARMISD::CMPFPw0, dl,
+ FlagsVT, LHS);
+ return DAG.getNode(ARMISD::FMSTAT, dl, MVT::Glue, Flags);
}
/// duplicateCmp - Glue values can have only one use, so this function
@@ -4991,15 +4991,11 @@ ARMTargetLowering::duplicateCmp(SDValue Cmp, SelectionDAG &DAG) const {
return DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0),Cmp.getOperand(1));
assert(Opc == ARMISD::FMSTAT && "unexpected comparison operation");
- Cmp = Cmp.getOperand(0);
- Opc = Cmp.getOpcode();
- if (Opc == ARMISD::CMPFP)
- Cmp = DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0),Cmp.getOperand(1));
- else {
- assert(Opc == ARMISD::CMPFPw0 && "unexpected operand of FMSTAT");
- Cmp = DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0));
- }
- return DAG.getNode(ARMISD::FMSTAT, DL, MVT::Glue, Cmp);
+ SDValue Flags = Cmp.getOperand(0);
+ assert((Flags.getOpcode() == ARMISD::CMPFP ||
+ Flags.getOpcode() == ARMISD::CMPFPw0) &&
+ "unexpected operand of FMSTAT");
+ return DAG.getNode(ARMISD::FMSTAT, DL, MVT::Glue, Flags);
}
// This function returns three things: the arithmetic computation itself
diff --git a/llvm/lib/Target/ARM/ARMInstrVFP.td b/llvm/lib/Target/ARM/ARMInstrVFP.td
index 5b49f728ebb8d8..a29753909ea992 100644
--- a/llvm/lib/Target/ARM/ARMInstrVFP.td
+++ b/llvm/lib/Target/ARM/ARMInstrVFP.td
@@ -10,7 +10,17 @@
//
//===----------------------------------------------------------------------===//
-def SDT_CMPFP0 : SDTypeProfile<0, 1, [SDTCisFP<0>]>;
+def SDT_CMPFP : SDTypeProfile<1, 2, [
+ SDTCisVT<0, FlagsVT>, // out flags
+ SDTCisFP<1>, // lhs
+ SDTCisSameAs<2, 1> // rhs
+]>;
+
+def SDT_CMPFP0 : SDTypeProfile<1, 1, [
+ SDTCisVT<0, FlagsVT>, // out flags
+ SDTCisFP<1> // operand
+]>;
+
def SDT_VMOVDRR : SDTypeProfile<1, 2, [SDTCisVT<0, f64>, SDTCisVT<1, i32>,
SDTCisSameAs<1, 2>]>;
def SDT_VMOVRRD : SDTypeProfile<2, 1, [SDTCisVT<0, i32>, SDTCisSameAs<0, 1>,
@@ -18,11 +28,18 @@ def SDT_VMOVRRD : SDTypeProfile<2, 1, [SDTCisVT<0, i32>, SDTCisSameAs<0, 1>,
def SDT_VMOVSR : SDTypeProfile<1, 1, [SDTCisVT<0, f32>, SDTCisVT<1, i32>]>;
-def arm_fmstat : SDNode<"ARMISD::FMSTAT", SDTNone, [SDNPInGlue, SDNPOutGlue]>;
-def arm_cmpfp : SDNode<"ARMISD::CMPFP", SDT_ARMCmp, [SDNPOutGlue]>;
-def arm_cmpfp0 : SDNode<"ARMISD::CMPFPw0", SDT_CMPFP0, [SDNPOutGlue]>;
-def arm_cmpfpe : SDNode<"ARMISD::CMPFPE", SDT_ARMCmp, [SDNPOutGlue]>;
-def arm_cmpfpe0: SDNode<"ARMISD::CMPFPEw0",SDT_CMPFP0, [SDNPOutGlue]>;
+def arm_cmpfp : SDNode<"ARMISD::CMPFP", SDT_CMPFP>;
+def arm_cmpfp0 : SDNode<"ARMISD::CMPFPw0", SDT_CMPFP0>;
+def arm_cmpfpe : SDNode<"ARMISD::CMPFPE", SDT_CMPFP>;
+def arm_cmpfpe0 : SDNode<"ARMISD::CMPFPEw0", SDT_CMPFP0>;
+
+def arm_fmstat : SDNode<"ARMISD::FMSTAT",
+ SDTypeProfile<0, 1, [
+ SDTCisVT<0, FlagsVT> // in flags
+ ]>,
+ [SDNPOutGlue] // TODO: Change Glue to a normal result.
+>;
+
def arm_fmdrr : SDNode<"ARMISD::VMOVDRR", SDT_VMOVDRR>;
def arm_fmrrd : SDNode<"ARMISD::VMOVRRD", SDT_VMOVRRD>;
def arm_vmovsr : SDNode<"ARMISD::VMOVSR", SDT_VMOVSR>;
@@ -606,12 +623,12 @@ let Defs = [FPSCR_NZCV] in {
def VCMPED : ADuI<0b11101, 0b11, 0b0100, 0b11, 0,
(outs), (ins DPR:$Dd, DPR:$Dm),
IIC_fpCMP64, "vcmpe", ".f64\t$Dd, $Dm", "",
- [(arm_cmpfpe DPR:$Dd, (f64 DPR:$Dm))]>;
+ [(set FPSCR_NZCV, (arm_cmpfpe DPR:$Dd, (f64 DPR:$Dm)))]>;
def VCMPES : ASuI<0b11101, 0b11, 0b0100, 0b11, 0,
(outs), (ins SPR:$Sd, SPR:$Sm),
IIC_fpCMP32, "vcmpe", ".f32\t$Sd, $Sm", "",
- [(arm_cmpfpe SPR:$Sd, SPR:$Sm)]> {
+ [(set FPSCR_NZCV, (arm_cmpfpe SPR:$Sd, SPR:$Sm))]> {
// Some single precision VFP instructions may be executed on both NEON and
// VFP pipelines on A8.
let D = VFPNeonA8Domain;
@@ -620,17 +637,17 @@ def VCMPES : ASuI<0b11101, 0b11, 0b0100, 0b11, 0,
def VCMPEH : AHuI<0b11101, 0b11, 0b0100, 0b11, 0,
(outs), (ins HPR:$Sd, HPR:$Sm),
IIC_fpCMP16, "vcmpe", ".f16\t$Sd, $Sm",
- [(arm_cmpfpe (f16 HPR:$Sd), (f16 HPR:$Sm))]>;
+ [(set FPSCR_NZCV, (arm_cmpfpe (f16 HPR:$Sd), (f16 HPR:$Sm)))]>;
def VCMPD : ADuI<0b11101, 0b11, 0b0100, 0b01, 0,
(outs), (ins DPR:$Dd, DPR:$Dm),
IIC_fpCMP64, "vcmp", ".f64\t$Dd, $Dm", "",
- [(arm_cmpfp DPR:$Dd, (f64 DPR:$Dm))]>;
+ [(set FPSCR_NZCV, (arm_cmpfp DPR:$Dd, (f64 DPR:$Dm)))]>;
def VCMPS : ASuI<0b11101, 0b11, 0b0100, 0b01, 0,
(outs), (ins SPR:$Sd, SPR:$Sm),
IIC_fpCMP32, "vcmp", ".f32\t$Sd, $Sm", "",
- [(arm_cmpfp SPR:$Sd, SPR:$Sm)]> {
+ [(set FPSCR_NZCV, (arm_cmpfp SPR:$Sd, SPR:$Sm))]> {
// Some single precision VFP instructions may be executed on both NEON and
// VFP pipelines on A8.
let D = VFPNeonA8Domain;
@@ -639,7 +656,7 @@ def VCMPS : ASuI<0b11101, 0b11, 0b0100, 0b01, 0,
def VCMPH : AHuI<0b11101, 0b11, 0b0100, 0b01, 0,
(outs), (ins HPR:$Sd, HPR:$Sm),
IIC_fpCMP16, "vcmp", ".f16\t$Sd, $Sm",
- [(arm_cmpfp (f16 HPR:$Sd), (f16 HPR:$Sm))]>;
+ [(set FPSCR_NZCV, (arm_cmpfp (f16 HPR:$Sd), (f16 HPR:$Sm)))]>;
} // Defs = [FPSCR_NZCV]
//===----------------------------------------------------------------------===//
@@ -669,7 +686,7 @@ let Defs = [FPSCR_NZCV] in {
def VCMPEZD : ADuI<0b11101, 0b11, 0b0101, 0b11, 0,
(outs), (ins DPR:$Dd),
IIC_fpCMP64, "vcmpe", ".f64\t$Dd, #0", "",
- [(arm_cmpfpe0 (f64 DPR:$Dd))]> {
+ [(set FPSCR_NZCV, (arm_cmpfpe0 (f64 DPR:$Dd)))]> {
let Inst{3-0} = 0b0000;
let Inst{5} = 0;
}
@@ -677,7 +694,7 @@ def VCMPEZD : ADuI<0b11101, 0b11, 0b0101, 0b11, 0,
def VCMPEZS : ASuI<0b11101, 0b11, 0b0101, 0b11, 0,
(outs), (ins SPR:$Sd),
IIC_fpCMP32, "vcmpe", ".f32\t$Sd, #0", "",
- [(arm_cmpfpe0 SPR:$Sd)]> {
+ [(set FPSCR_NZCV, (arm_cmpfpe0 SPR:$Sd))]> {
let Inst{3-0} = 0b0000;
let Inst{5} = 0;
@@ -689,7 +706,7 @@ def VCMPEZS : ASuI<0b11101, 0b11, 0b0101, 0b11, 0,
def VCMPEZH : AHuI<0b11101, 0b11, 0b0101, 0b11, 0,
(outs), (ins HPR:$Sd),
IIC_fpCMP16, "vcmpe", ".f16\t$Sd, #0",
- [(arm_cmpfpe0 (f16 HPR:$Sd))]> {
+ [(set FPSCR_NZCV, (arm_cmpfpe0 (f16 HPR:$Sd)))]> {
let Inst{3-0} = 0b0000;
let Inst{5} = 0;
}
@@ -697,7 +714,7 @@ def VCMPEZH : AHuI<0b11101, 0b11, 0b0101, 0b11, 0,
def VCMPZD : ADuI<0b11101, 0b11, 0b0101, 0b01, 0,
(outs), (ins DPR:$Dd),
IIC_fpCMP64, "vcmp", ".f64\t$Dd, #0", "",
- [(arm_cmpfp0 (f64 DPR:$Dd))]> {
+ [(set FPSCR_NZCV, (arm_cmpfp0 (f64 DPR:$Dd)))]> {
let Inst{3-0} = 0b0000;
let Inst{5} = 0;
}
@@ -705,7 +722,7 @@ def VCMPZD : ADuI<0b11101, 0b11, 0b0101, 0b01, 0,
def VCMPZS : ASuI<0b11101, 0b11, 0b0101, 0b01, 0,
(outs), (ins SPR:$Sd),
IIC_fpCMP32, "vcmp", ".f32\t$Sd, #0", "",
- [(arm_cmpfp0 SPR:$Sd)]> {
+ [(set FPSCR_NZCV, (arm_cmpfp0 SPR:$Sd))]> {
let Inst{3-0} = 0b0000;
let Inst{5} = 0;
@@ -717,7 +734,7 @@ def VCMPZS : ASuI<0b11101, 0b11, 0b0101, 0b01, 0,
def VCMPZH : AHuI<0b11101, 0b11, 0b0101, 0b01, 0,
(outs), (ins HPR:$Sd),
IIC_fpCMP16, "vcmp", ".f16\t$Sd, #0",
- [(arm_cmpfp0 (f16 HPR:$Sd))]> {
+ [(set FPSCR_NZCV, (arm_cmpfp0 (f16 HPR:$Sd)))]> {
let Inst{3-0} = 0b0000;
let Inst{5} = 0;
}
@@ -2492,7 +2509,8 @@ let DecoderMethod = "DecodeForVMRSandVMSR" in {
let Defs = [CPSR], Uses = [FPSCR_NZCV], Predicates = [HasFPRegs],
Rt = 0b1111 /* apsr_nzcv */ in
def FMSTAT : MovFromVFP<0b0001 /* fpscr */, (outs), (ins),
- "vmrs", "\tAPSR_nzcv, fpscr", [(arm_fmstat)]>;
+ "vmrs", "\tAPSR_nzcv, fpscr",
+ [(arm_fmstat FPSCR_NZCV)]>;
// Application level FPSCR -> GPR
let hasSideEffects = 1, Uses = [FPSCR], Predicates = [HasFPRegs] in
diff --git a/llvm/lib/Target/ARM/ARMRegisterInfo.td b/llvm/lib/Target/ARM/ARMRegisterInfo.td
index f37d0fe542b4f7..f5a675e2976bb7 100644
--- a/llvm/lib/Target/ARM/ARMRegisterInfo.td
+++ b/llvm/lib/Target/ARM/ARMRegisterInfo.td
@@ -413,7 +413,9 @@ def VCCR : RegisterClass<"ARM", [i32, v16i1, v8i1, v4i1, v2i1], 32, (add VPR)> {
// FPSCR, when the flags at the top of it are used as the input or
// output to an instruction such as MVE VADC.
-def cl_FPSCR_NZCV : RegisterClass<"ARM", [i32], 32, (add FPSCR_NZCV)>;
+def cl_FPSCR_NZCV : RegisterClass<"ARM", [i32], 32, (add FPSCR_NZCV)> {
+ let CopyCost = -1;
+}
// Scalar single precision floating point register class..
// FIXME: Allocation order changed to s0, s2, ... or s0, s4, ... as a quick hack
diff --git a/llvm/test/CodeGen/ARM/fcmp-xo.ll b/llvm/test/CodeGen/ARM/fcmp-xo.ll
index 3d5972f065859f..908dbd7a11a6b6 100644
--- a/llvm/test/CodeGen/ARM/fcmp-xo.ll
+++ b/llvm/test/CodeGen/ARM/fcmp-xo.ll
@@ -54,12 +54,12 @@ define arm_aapcs_vfpcc float @float128(float %a0) local_unnamed_addr {
; NEON-LABEL: float128:
; NEON: @ %bb.0:
; NEON-NEXT: mov.w r0, #1124073472
-; NEON-NEXT: vmov.f32 s2, #5.000000e-01
-; NEON-NEXT: vmov d3, r0, r0
-; NEON-NEXT: vmov.f32 s4, #-5.000000e-01
-; NEON-NEXT: vcmp.f32 s6, s0
+; NEON-NEXT: vmov.f32 s4, #5.000000e-01
+; NEON-NEXT: vmov d1, r0, r0
+; NEON-NEXT: vmov.f32 s6, #-5.000000e-01
+; NEON-NEXT: vcmp.f32 s2, s0
; NEON-NEXT: vmrs APSR_nzcv, fpscr
-; NEON-NEXT: vselgt.f32 s0, s4, s2
+; NEON-NEXT: vselgt.f32 s0, s6, s4
; NEON-NEXT: bx lr
%1 = fcmp nsz olt float %a0, 128.000000e+00
%2 = select i1 %1, float -5.000000e-01, float 5.000000e-01
diff --git a/llvm/test/CodeGen/ARM/fp16-instructions.ll b/llvm/test/CodeGen/ARM/fp16-instructions.ll
index 1988cb1d2f9039..7a1d5ddfa301b6 100644
--- a/llvm/test/CodeGen/ARM/fp16-instructions.ll
+++ b/llvm/test/CodeGen/ARM/fp16-instructions.ll
@@ -700,9 +700,9 @@ define half @select_cc1(ptr %a0) {
; CHECK-LABEL: select_cc1:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vseleq.f16 s0,
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vseleq.f16 s0,
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -728,9 +728,9 @@ define half @select_cc_ge1(ptr %a0) {
; CHECK-LABEL: select_cc_ge1:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselge.f16 s0,
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselge.f16 s0,
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -751,9 +751,9 @@ define half @select_cc_ge2(ptr %a0) {
; CHECK-LABEL: select_cc_ge2:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselge.f16 s0,
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselge.f16 s0,
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -774,9 +774,9 @@ define half @select_cc_ge3(ptr %a0) {
; CHECK-LABEL: select_cc_ge3:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselge.f16 s0,
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselge.f16 s0,
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -797,9 +797,9 @@ define half @select_cc_ge4(ptr %a0) {
; CHECK-LABEL: select_cc_ge4:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselge.f16 s0, s{{.}}, s{{.}}
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselge.f16 s0, s{{.}}, s{{.}}
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -821,9 +821,9 @@ define half @select_cc_gt1(ptr %a0) {
; CHECK-LABEL: select_cc_gt1:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselgt.f16 s0, s{{.}}, s{{.}}
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselgt.f16 s0, s{{.}}, s{{.}}
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -844,9 +844,9 @@ define half @select_cc_gt2(ptr %a0) {
; CHECK-LABEL: select_cc_gt2:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselgt.f16 s0, s{{.}}, s{{.}}
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselgt.f16 s0, s{{.}}, s{{.}}
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -867,9 +867,9 @@ define half @select_cc_gt3(ptr %a0) {
; CHECK-LABEL: select_cc_gt3:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselgt.f16 s0, s{{.}}, s{{.}}
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselgt.f16 s0, s{{.}}, s{{.}}
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -890,9 +890,9 @@ define half @select_cc_gt4(ptr %a0) {
; CHECK-LABEL: select_cc_gt4:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselgt.f16 s0, s{{.}}, s{{.}}
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselgt.f16 s0, s{{.}}, s{{.}}
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -923,10 +923,10 @@ entry:
; CHECK-LABEL: select_cc4:
; CHECK-HARDFP-FULLFP16: vldr.16 [[S2:s[0-9]]], .LCPI{{.*}}
+; CHECK-HARDFP-FULLFP16: vcmp.f16 s0, [[S2]]
; CHECK-HARDFP-FULLFP16: vldr.16 [[S4:s[0-9]]], .LCPI{{.*}}
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
; CHECK-HARDFP-FULLFP16: vmov.f16 [[S6:s[0-9]]], #-2.000000e+00
-; CHECK-HARDFP-FULLFP16: vcmp.f16 s0, [[S2]]
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
; CHECK-HARDFP-FULLFP16-NEXT: vseleq.f16 [[S0:s[0-9]]], [[S6]], [[S4]]
; CHECK-HARDFP-FULLFP16-NEXT: vselvs.f16 s0, [[S6]], [[S0]]
diff --git a/llvm/test/CodeGen/ARM/fp16-vminmaxnm-safe.ll b/llvm/test/CodeGen/ARM/fp16-vminmaxnm-safe.ll
index 56e734c4404336..996b46c51ab361 100644
--- a/llvm/test/CodeGen/ARM/fp16-vminmaxnm-safe.ll
+++ b/llvm/test/CodeGen/ARM/fp16-vminmaxnm-safe.ll
@@ -5,11 +5,11 @@
define half @fp16_vminnm_o(half %a, half %b) {
; CHECK-LABEL: fp16_vminnm_o:
; CHECK: @ %bb.0: @ %entry
-; CHECK-NEXT: vmov.f16 s0, r1
-; CHECK-NEXT: vmov.f16 s2, r0
-; CHECK-NEXT: vcmp.f16 s0, s2
+; CHECK-NEXT: vmov.f16 s0, r0
+; CHECK-NEXT: vmov.f16 s2, r1
+; CHECK-NEXT: vcmp.f16 s2, s0
; CHECK-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NEXT: vselgt.f16 s0, s2, s0
+; CHECK-NEXT: vselgt.f16 s0, s0, s2
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr
entry:
@@ -37,11 +37,11 @@ entry:
define half @fp16_vminnm_u(half %a, half %b) {
; CHECK-LABEL: fp16_vminnm_u:
; CHECK: @ %bb.0: @ %entry
-; CHECK-NEXT: vmov.f16 s0, r0
-; CHECK-NEXT: vmov.f16 s2, r1
-; CHECK-NEXT: vcmp.f16 s0, s2
+; CHECK-NEXT: vmov.f16 s0, r1
+; CHECK-NEXT: vmov.f16 s2, r0
+; CHECK-NEXT: vcmp.f16 s2, s0
; CHECK-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NEXT: vselge.f16 s0, s2, s0
+; CHECK-NEXT: vselge.f16 s0, s0, s2
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr
entry:
@@ -53,11 +53,11 @@ entry:
define half @fp16_vminnm_ule(half %a, half %b) {
; CHECK-LABEL: fp16_vminnm_ule:
; CHECK: @ %bb.0: @ %entry
-; CHECK-NEXT: vmov.f16 s0, r0
-; CHECK-NEXT: vmov.f16 s2, r1
-; CHECK-NEXT: vcmp.f16 s0, s2
+; CHECK-NEXT: vmov.f16 s0, r1
+; CHECK-NEXT: vmov.f16 s2, r0
+; CHECK-NEXT: vcmp.f16 s2, s0
; CHECK-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NEXT: vselgt.f16 s0, s2, s0
+; CHECK-NEXT: vselgt.f16 s0, s0, s2
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr
entry:
@@ -69,11 +69,11 @@ entry:
define half @fp16_vminnm_u_rev(half %a, half %b) {
; CHECK-LABEL: fp16_vminnm_u_rev:
; CHECK: @ %bb.0: @ %entry
-; CHECK-NEXT: vmov.f16 s0, r1
-; CHECK-NEXT: vmov.f16 s2, r0
-; CHECK-NEXT: vcmp.f16 s0, s2
+; CHECK-NEXT: vmov.f16 s0, r0
+; CHECK-NEXT: vmov.f16 s2, r1
+; CHECK-NEXT: vcmp.f16 s2, s0
; CHECK-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NEXT: vselge.f16 s0, s2, s0
+; CHECK-NEXT: vselge.f16 s0, s0, s2
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr
entry:
diff --git a/llvm/test/CodeGen/ARM/fptosi-sat-scalar.ll b/llvm/test/CodeGen/ARM/fptosi-sat-scalar.ll
index 4b27e804e6df9a..84f6ee276ba5f1 100644
--- a/llvm/test/CodeGen/ARM/fptosi-sat-scalar.ll
+++ b/llvm/test/CodeGen/ARM/fptosi-sat-scalar.ll
@@ -258,11 +258,11 @@ define i13 @test_signed_i13_f32(float %f) nounwind {
; VFP2: @ %bb.0:
; VFP2-NEXT: vmov s0, r0
; VFP2-NEXT: vldr s2, .LCPI2_0
+; VFP2-NEXT: vldr s6, .LCPI2_1
; VFP2-NEXT: vcvt.s32.f32 s4, s0
; VFP2-NEXT: vcmp.f32 s0, s2
-; VFP2-NEXT: vldr s2, .LCPI2_1
; VFP2-NEXT: vmrs APSR_nzcv, fpscr
-; VFP2-NEXT: vcmp.f32 s0, s2
+; VFP2-NEXT: vcmp.f32 s0, s6
; VFP2-NEXT: vmov r0, s4
; VFP2-NEXT: itt lt
; VFP2-NEXT: movwlt r0, #61440
@@ -358,11 +358,11 @@ define i16 @test_signed_i16_f32(float %f) nounwind {
; VFP2: @ %bb.0:
; VFP2-NEXT: vmov s0, r0
; VFP2-NEXT: vldr s2, .LCPI3_0
+; VFP2-NEXT: vldr s6, .LCPI3_1
; VFP2-NEXT: vcvt.s32.f32 s4, s0
; VFP2-NEXT: vcmp.f32 s0, s2
-; VFP2-NEXT: vldr s2, .LCPI3_1
; VFP2-NEXT: vmrs APSR_nzcv, fpscr
-; VFP2-NEXT: vcmp.f32 s0, s2
+; VFP2-NEXT: vcmp.f32 s0, s6
; VFP2-NEXT: vmov r0, s4
; VFP2-NEXT: itt lt
; VFP2-NEXT: movwlt r0, #32768
@@ -458,11 +458,11 @@ define i19 @test_signed_i19_f32(float %f) nounwind {
; VFP2: @ %bb.0:
; VFP2-NEXT: vmov s0, r0
; VFP2-NEXT: vldr s2, .LCPI4_0
+; VFP2-NEXT: vldr s6, .LCPI4_1
; VFP2-NEXT: vcvt.s32.f32 s4, s0
; VFP2-NEXT: vcmp.f32 s0, s2
-; VFP2-NEXT: vldr s2, .LCPI4_1
; VFP2-NEXT: vmrs APSR_nzcv, fpscr
-; VFP2-NEXT: vcmp.f32 s0, s2
+; VFP2-NEXT: vcmp.f32 s0, s6
; VFP2-N...
[truncated]
|
The test failure on the Windows bot appears to be caused by #113697. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, there are a couple more things I would also like to point out.
Ah, I see you've also explained the negative CopyCost
. Great, I was wondering about that too :-)
; VFP-NEXT: itttt vs | ||
; VFP-NEXT: movvs r0, #0 | ||
; VFP-NEXT: movvs r1, #0 | ||
; VFP-NEXT: movvs r2, #0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this kind of thing is a particularly impressive improvement!
Following llvm#116547 and llvm#116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.
Following llvm#116547 and llvm#116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.
This change caused miscompilations (crashes due to illegal instruction) at runtime. It can also be observed as failed asserts if compiling to assembly output, like this: double a, b;
double c() {
double d = a = b;
if (d)
a = -b;
a = c();
if (d)
a = -a;
return a;
} $ clang -target armv7-linux-gnueabihf -S repro.c -O2
clang: ../lib/Target/ARM/MCTargetDesc/ARMInstPrinter.cpp:1296: void llvm::ARMInstPrinter::printT2AddrModeImm8s4Operand(const llvm::MCInst*, unsigned int, const
llvm::MCSubtargetInfo&, llvm::raw_ostream&) [with bool AlwaysPrintImm0 = false]: Assertion `((OffImm & 0x3) == 0) && "Not a valid immediate!"' failed. (Note that compiling with The original, non-reduced case is also available at https://martin.st/temp/lfg-preproc.c, in case this reduction misses the original use case (the original case didn't involve a function calling itself I think). Please have a look, or revert if fixing takes a while. |
Hi |
This reverts commit 8c56dd3.
The issue is that FPSCR gets spilled and restored, and the load and the store have incorrect addressing mode:
This can be fixed by handling the corresponding register class in |
Reverts #116676 Reverting per post-commit feedback (causes miscompilation errors and/or assertion failures).
The machine verifier error above is wrong, the immediate is valid. These instruction have |
Hmmm - but when running the generated code, execution fails with Illegal instruction, both for ARM and Thumb mode. So even if the immediate may be valid, there's something wrong with the generated code here still... |
It looks like these instructions do not exist in ARM mode. At least when I fix the encoding issue, the resulting assembly file is not accepted by the assembler:
The encoding of this addressing mode is broken, but I believe only in ARM mode. NB |
[ARM] Stop gluing FP comparisons to FMSTAT (llvm#116676) Following llvm#116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. `TRI::getCrossCopyRegClass` is modified in a way that prevents DAG scheduler from copying FPSCR into a virtual register. The register allocator might need to spill the virtual register, but that only seem to work in Thumb mode.
PR with candidate fix: #117248 It successfully compiles the test attached above, and produces the same assembly as before this patch modulo register numbers. |
Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change. Pull Request: #116970
Re-landing after fixing miscompilation error. The original change made it possible for CMPZ to have multiple uses; `ARMDAGToDAGISel::SelectCMPZ` was not prepared for this. Original commit message: Following llvm#116547 and llvm#116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.
Re-landing #116970 after fixing miscompilation error. The original change made it possible for CMPZ to have multiple uses; `ARMDAGToDAGISel::SelectCMPZ` was not prepared for this. Pull Request: #118887 Original commit message: Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.
Following #116547, this changes the result of
ARMISD::CMPFP*
and the operand ofARMISD::FMSTAT
from a specialGlue
type to a normal type.This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes.
Note that
ARMISD::FMSTAT
is still glued to its consumer nodes; this is going to be changed in a separate patch.This patch also sets
CopyCost
ofcl_FPSCR_NZCV
register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies ofFPCSR_NZCV
register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place.There might be a case when a copy can't be avoided (although not found in existing tests). If a copy is necessary, the virtual register will be created with
cl_FPSCR_NZCV
register class. If this register class is inappropriate,TRI::getCrossCopyRegClass
should be modified to return the correct class.