[ARM] Stop gluing 1-bit shifts #116547

s-barannikov · 2024-11-17T16:21:59Z

When two (or more) nodes are glued, DAG scheduler will always schedule them as one piece, i.e. it will not allow any instructions to be scheduled between them. It does so because if nodes are glued this usually means that there is an implicit register dependency between them, and an intervening node could clobber this physical register. When emitting such nodes into machine IR, they will also be stuck together, e.g.:

    %9:gpr = MOVsrl_glue killed %8, implicit-def $cpsr
    %10:gpr = RRX %3, implicit $cpsr

If a node has Glue result, SelectionDAG will not try to CSE this node. If it did, it would break the implicit physical register dependency. In practice this means that if a node with Glue result has multiple uses, it has to be duplicated before each use. This the reason for ARMTargetLowering::duplicateCmp to exist.

When using normal data dependency, dependent nodes can freely be scheduled around. If there is a physical register dependency between nodes, the physical register will be copied to/from a virtual register, allowing other nodes to intervene between them. The resulting machine IR might look like this:

    %9:gpr = LSRs1 killed %8, implicit-def $cpsr
    %10:gpr = COPY $cpsr
    %11:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al */, $noreg, $noreg
    %12:gpr = BICri killed %11, -2147483648, 14 /* CC::al */, $noreg, $noreg
    $cpsr = COPY %10
    %13:gpr = RRX %3, implicit $cpsr

The two copies are likely to be eliminated by register coalescer, given that there are no instructions between them that clobber this physical register. If the copies are unwanted in the first place (they could be expensive or impossible), DAG scheduler will try to avoid inserting them wherever possible, and the resulting machine IR will look like this:

    %9:gpr = LSRs1 killed %8, implicit-def $cpsr
    %10:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al */, $noreg, $noreg
    %11:gpr = BICri killed %10, -2147483648, 14 /* CC::al */, $noreg, $noreg
    %12:gpr = RRX %3, implicit $cpsr

On ARM, arithmetic operations and LSLS already use the new data flow approach. This patch extends it to include 1-bit shifts.

Use normal data flow instead. There are several more nodes that are still glued, I'll try to change that in subsequent patches.

llvmbot · 2024-11-17T16:22:33Z

@llvm/pr-subscribers-backend-arm

Author: Sergei Barannikov (s-barannikov)

Changes

Use normal data flow instead.

There are several more nodes that are still glued, I'll try to change that in subsequent patches.

Full diff: https://github.com/llvm/llvm-project/pull/116547.diff

4 Files Affected:

(modified) llvm/lib/Target/ARM/ARMISelLowering.cpp (+4-1)
(modified) llvm/lib/Target/ARM/ARMInstrInfo.td (+26-11)
(modified) llvm/lib/Target/ARM/ARMInstrThumb2.td (+13-12)
(modified) llvm/test/CodeGen/ARM/urem-seteq-illegal-types.ll (+24-24)

diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 7fce91f97f3618..aed91c7bb8a2a9 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -149,6 +149,9 @@ MVEMaxSupportedInterleaveFactor("mve-max-interleave-factor", cl::Hidden,
   cl::desc("Maximum interleave factor for MVE VLDn to generate."),
   cl::init(2));
 
+/// Value type used for "flags" operands / results (either CPSR or FPSCR_NZCV).
+constexpr MVT FlagsVT = MVT::i32;
+
 // The APCS parameter registers.
 static const MCPhysReg GPRArgRegs[] = {
   ARM::R0, ARM::R1, ARM::R2, ARM::R3
@@ -6850,7 +6853,7 @@ static SDValue Expand64BitShift(SDNode *N, SelectionDAG &DAG,
   // First, build a SRA_GLUE/SRL_GLUE op, which shifts the top part by one and
   // captures the result into a carry flag.
   unsigned Opc = N->getOpcode() == ISD::SRL ? ARMISD::SRL_GLUE:ARMISD::SRA_GLUE;
-  Hi = DAG.getNode(Opc, dl, DAG.getVTList(MVT::i32, MVT::Glue), Hi);
+  Hi = DAG.getNode(Opc, dl, DAG.getVTList(MVT::i32, FlagsVT), Hi);
 
   // The low part is an ARMISD::RRX operand, which shifts the carry in.
   Lo = DAG.getNode(ARMISD::RRX, dl, MVT::i32, Lo, Hi.getValue(1));
diff --git a/llvm/lib/Target/ARM/ARMInstrInfo.td b/llvm/lib/Target/ARM/ARMInstrInfo.td
index d24d4af36f0d86..9fa3c9a6c25fd5 100644
--- a/llvm/lib/Target/ARM/ARMInstrInfo.td
+++ b/llvm/lib/Target/ARM/ARMInstrInfo.td
@@ -14,6 +14,9 @@
 // ARM specific DAG Nodes.
 //
 
+/// Value type used for "flags" operands / results (either CPSR or FPSCR_NZCV).
+defvar FlagsVT = i32;
+
 // Type profiles.
 def SDT_ARMCallSeqStart : SDCallSeqStart<[ SDTCisVT<0, i32>,
                                            SDTCisVT<1, i32> ]>;
@@ -77,6 +80,18 @@ def SDT_ARMMEMCPY  : SDTypeProfile<2, 3, [SDTCisVT<0, i32>, SDTCisVT<1, i32>,
                                           SDTCisVT<2, i32>, SDTCisVT<3, i32>,
                                           SDTCisVT<4, i32>]>;
 
+def SDTIntUnaryOpWithFlagsOut : SDTypeProfile<2, 1, [
+  SDTCisInt<0>,         // result
+  SDTCisVT<1, FlagsVT>, // out flags
+  SDTCisSameAs<2, 0>    // operand
+]>;
+
+def SDTIntUnaryOpWithFlagsIn : SDTypeProfile<1, 2, [
+  SDTCisInt<0>,        // result
+  SDTCisSameAs<1, 0>,  // operand
+  SDTCisVT<1, FlagsVT> // in flags
+]>;
+
 def SDTBinaryArithWithFlags : SDTypeProfile<2, 2,
                                             [SDTCisSameAs<0, 2>,
                                              SDTCisSameAs<0, 3>,
@@ -191,9 +206,9 @@ def ARMasrl          : SDNode<"ARMISD::ASRL", SDT_ARMIntShiftParts, []>;
 def ARMlsrl          : SDNode<"ARMISD::LSRL", SDT_ARMIntShiftParts, []>;
 def ARMlsll          : SDNode<"ARMISD::LSLL", SDT_ARMIntShiftParts, []>;
 
-def ARMsrl_glue      : SDNode<"ARMISD::SRL_GLUE", SDTIntUnaryOp, [SDNPOutGlue]>;
-def ARMsra_glue      : SDNode<"ARMISD::SRA_GLUE", SDTIntUnaryOp, [SDNPOutGlue]>;
-def ARMrrx           : SDNode<"ARMISD::RRX"     , SDTIntUnaryOp, [SDNPInGlue ]>;
+def ARMsrl_glue : SDNode<"ARMISD::SRL_GLUE", SDTIntUnaryOpWithFlagsOut>;
+def ARMsra_glue : SDNode<"ARMISD::SRA_GLUE", SDTIntUnaryOpWithFlagsOut>;
+def ARMrrx      : SDNode<"ARMISD::RRX"     , SDTIntUnaryOpWithFlagsIn>;
 
 def ARMaddc          : SDNode<"ARMISD::ADDC",  SDTBinaryArithWithFlags,
                               [SDNPCommutative]>;
@@ -3731,19 +3746,19 @@ def : ARMPat<(or GPR:$src, 0xffff0000), (MOVTi16 GPR:$src, 0xffff)>,
 
 let Uses = [CPSR] in
 def RRX: PseudoInst<(outs GPR:$Rd), (ins GPR:$Rm), IIC_iMOVsi,
-                    [(set GPR:$Rd, (ARMrrx GPR:$Rm))]>, UnaryDP,
-                    Requires<[IsARM]>, Sched<[WriteALU]>;
+                    [(set GPR:$Rd, (ARMrrx GPR:$Rm, CPSR))]>,
+         UnaryDP, Requires<[IsARM]>, Sched<[WriteALU]>;
 
 // These aren't really mov instructions, but we have to define them this way
 // due to glue operands.
 
 let Defs = [CPSR] in {
-def MOVsrl_glue : PseudoInst<(outs GPR:$dst), (ins GPR:$src), IIC_iMOVsi,
-                      [(set GPR:$dst, (ARMsrl_glue GPR:$src))]>, UnaryDP,
-                      Sched<[WriteALU]>, Requires<[IsARM]>;
-def MOVsra_glue : PseudoInst<(outs GPR:$dst), (ins GPR:$src), IIC_iMOVsi,
-                      [(set GPR:$dst, (ARMsra_glue GPR:$src))]>, UnaryDP,
-                      Sched<[WriteALU]>, Requires<[IsARM]>;
+  def MOVsrl_glue : PseudoInst<(outs GPR:$dst), (ins GPR:$src), IIC_iMOVsi,
+                               [(set GPR:$dst, CPSR, (ARMsrl_glue GPR:$src))]>,
+                    UnaryDP, Sched<[WriteALU]>, Requires<[IsARM]>;
+  def MOVsra_glue : PseudoInst<(outs GPR:$dst), (ins GPR:$src), IIC_iMOVsi,
+                               [(set GPR:$dst, CPSR, (ARMsra_glue GPR:$src))]>,
+                    UnaryDP, Sched<[WriteALU]>, Requires<[IsARM]>;
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/ARM/ARMInstrThumb2.td b/llvm/lib/Target/ARM/ARMInstrThumb2.td
index cb20aacb539ad9..260d48491bd24a 100644
--- a/llvm/lib/Target/ARM/ARMInstrThumb2.td
+++ b/llvm/lib/Target/ARM/ARMInstrThumb2.td
@@ -2787,8 +2787,9 @@ def : T2Pat<(rotr rGPR:$lhs, (and rGPR:$rhs, lo5AllOne)),
 
 let Uses = [CPSR] in {
 def t2RRX : T2sTwoReg<(outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iMOVsi,
-                   "rrx", "\t$Rd, $Rm",
-                   [(set rGPR:$Rd, (ARMrrx rGPR:$Rm))]>, Sched<[WriteALU]> {
+                      "rrx", "\t$Rd, $Rm",
+                      [(set rGPR:$Rd, (ARMrrx rGPR:$Rm, CPSR))]>,
+            Sched<[WriteALU]> {
   let Inst{31-27} = 0b11101;
   let Inst{26-25} = 0b01;
   let Inst{24-21} = 0b0010;
@@ -2801,11 +2802,11 @@ def t2RRX : T2sTwoReg<(outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iMOVsi,
 }
 
 let isCodeGenOnly = 1, Defs = [CPSR] in {
-def t2MOVsrl_glue : T2TwoRegShiftImm<
-                        (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iMOVsi,
-                        "lsrs", ".w\t$Rd, $Rm, #1",
-                        [(set rGPR:$Rd, (ARMsrl_glue rGPR:$Rm))]>,
-                        Sched<[WriteALU]> {
+def t2MOVsrl_glue
+    : T2TwoRegShiftImm<(outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iMOVsi,
+                       "lsrs", ".w\t$Rd, $Rm, #1",
+                       [(set rGPR:$Rd, CPSR, (ARMsrl_glue rGPR:$Rm))]>,
+      Sched<[WriteALU]> {
   let Inst{31-27} = 0b11101;
   let Inst{26-25} = 0b01;
   let Inst{24-21} = 0b0010;
@@ -2816,11 +2817,11 @@ def t2MOVsrl_glue : T2TwoRegShiftImm<
   let Inst{14-12} = 0b000;
   let Inst{7-6} = 0b01;
 }
-def t2MOVsra_glue : T2TwoRegShiftImm<
-                        (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iMOVsi,
-                        "asrs", ".w\t$Rd, $Rm, #1",
-                        [(set rGPR:$Rd, (ARMsra_glue rGPR:$Rm))]>,
-                        Sched<[WriteALU]> {
+def t2MOVsra_glue
+    : T2TwoRegShiftImm<(outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iMOVsi,
+                       "asrs", ".w\t$Rd, $Rm, #1",
+                       [(set rGPR:$Rd, CPSR, (ARMsra_glue rGPR:$Rm))]>,
+      Sched<[WriteALU]> {
   let Inst{31-27} = 0b11101;
   let Inst{26-25} = 0b01;
   let Inst{24-21} = 0b0010;
diff --git a/llvm/test/CodeGen/ARM/urem-seteq-illegal-types.ll b/llvm/test/CodeGen/ARM/urem-seteq-illegal-types.ll
index 8900d5f541e8aa..b85cb3a4f191c8 100644
--- a/llvm/test/CodeGen/ARM/urem-seteq-illegal-types.ll
+++ b/llvm/test/CodeGen/ARM/urem-seteq-illegal-types.ll
@@ -628,13 +628,13 @@ define i1 @test_urem_larger(i63 %X) nounwind {
 ; ARM5-NEXT:    mla r0, r1, r12, r4
 ; ARM5-NEXT:    bic r0, r0, #-2147483648
 ; ARM5-NEXT:    lsrs r0, r0, #1
-; ARM5-NEXT:    rrx r1, r3
+; ARM5-NEXT:    rrx r2, r3
 ; ARM5-NEXT:    orr r0, r0, r3, lsl #30
 ; ARM5-NEXT:    ldr r3, .LCPI5_2
-; ARM5-NEXT:    bic r2, r0, #-2147483648
+; ARM5-NEXT:    bic r1, r0, #-2147483648
 ; ARM5-NEXT:    mov r0, #0
-; ARM5-NEXT:    subs r1, r1, r3
-; ARM5-NEXT:    sbcs r1, r2, #1
+; ARM5-NEXT:    subs r2, r2, r3
+; ARM5-NEXT:    sbcs r1, r1, #1
 ; ARM5-NEXT:    movlo r0, #1
 ; ARM5-NEXT:    pop {r4, pc}
 ; ARM5-NEXT:    .p2align 2
@@ -656,13 +656,13 @@ define i1 @test_urem_larger(i63 %X) nounwind {
 ; ARM6-NEXT:    mla r0, r1, r12, r0
 ; ARM6-NEXT:    bic r0, r0, #-2147483648
 ; ARM6-NEXT:    lsrs r0, r0, #1
-; ARM6-NEXT:    rrx r1, r3
+; ARM6-NEXT:    rrx r2, r3
 ; ARM6-NEXT:    orr r0, r0, r3, lsl #30
 ; ARM6-NEXT:    ldr r3, .LCPI5_2
-; ARM6-NEXT:    bic r2, r0, #-2147483648
+; ARM6-NEXT:    bic r1, r0, #-2147483648
 ; ARM6-NEXT:    mov r0, #0
-; ARM6-NEXT:    subs r1, r1, r3
-; ARM6-NEXT:    sbcs r1, r2, #1
+; ARM6-NEXT:    subs r2, r2, r3
+; ARM6-NEXT:    sbcs r1, r1, #1
 ; ARM6-NEXT:    movlo r0, #1
 ; ARM6-NEXT:    pop {r11, pc}
 ; ARM6-NEXT:    .p2align 2
@@ -686,14 +686,14 @@ define i1 @test_urem_larger(i63 %X) nounwind {
 ; ARM7-NEXT:    mla r0, r1, r12, r0
 ; ARM7-NEXT:    bic r0, r0, #-2147483648
 ; ARM7-NEXT:    lsrs r0, r0, #1
-; ARM7-NEXT:    rrx r1, r3
+; ARM7-NEXT:    rrx r2, r3
 ; ARM7-NEXT:    orr r0, r0, r3, lsl #30
 ; ARM7-NEXT:    movw r3, #24026
-; ARM7-NEXT:    bic r2, r0, #-2147483648
+; ARM7-NEXT:    bic r1, r0, #-2147483648
 ; ARM7-NEXT:    movt r3, #48461
-; ARM7-NEXT:    subs r1, r1, r3
+; ARM7-NEXT:    subs r2, r2, r3
 ; ARM7-NEXT:    mov r0, #0
-; ARM7-NEXT:    sbcs r1, r2, #1
+; ARM7-NEXT:    sbcs r1, r1, #1
 ; ARM7-NEXT:    movwlo r0, #1
 ; ARM7-NEXT:    pop {r11, pc}
 ;
@@ -709,14 +709,14 @@ define i1 @test_urem_larger(i63 %X) nounwind {
 ; ARM8-NEXT:    mla r0, r1, r12, r0
 ; ARM8-NEXT:    bic r0, r0, #-2147483648
 ; ARM8-NEXT:    lsrs r0, r0, #1
-; ARM8-NEXT:    rrx r1, r3
+; ARM8-NEXT:    rrx r2, r3
 ; ARM8-NEXT:    orr r0, r0, r3, lsl #30
 ; ARM8-NEXT:    movw r3, #24026
-; ARM8-NEXT:    bic r2, r0, #-2147483648
+; ARM8-NEXT:    bic r1, r0, #-2147483648
 ; ARM8-NEXT:    movt r3, #48461
-; ARM8-NEXT:    subs r1, r1, r3
+; ARM8-NEXT:    subs r2, r2, r3
 ; ARM8-NEXT:    mov r0, #0
-; ARM8-NEXT:    sbcs r1, r2, #1
+; ARM8-NEXT:    sbcs r1, r1, #1
 ; ARM8-NEXT:    movwlo r0, #1
 ; ARM8-NEXT:    pop {r11, pc}
 ;
@@ -732,14 +732,14 @@ define i1 @test_urem_larger(i63 %X) nounwind {
 ; NEON7-NEXT:    mla r0, r1, r12, r0
 ; NEON7-NEXT:    bic r0, r0, #-2147483648
 ; NEON7-NEXT:    lsrs r0, r0, #1
-; NEON7-NEXT:    rrx r1, r3
+; NEON7-NEXT:    rrx r2, r3
 ; NEON7-NEXT:    orr r0, r0, r3, lsl #30
 ; NEON7-NEXT:    movw r3, #24026
-; NEON7-NEXT:    bic r2, r0, #-2147483648
+; NEON7-NEXT:    bic r1, r0, #-2147483648
 ; NEON7-NEXT:    movt r3, #48461
-; NEON7-NEXT:    subs r1, r1, r3
+; NEON7-NEXT:    subs r2, r2, r3
 ; NEON7-NEXT:    mov r0, #0
-; NEON7-NEXT:    sbcs r1, r2, #1
+; NEON7-NEXT:    sbcs r1, r1, #1
 ; NEON7-NEXT:    movwlo r0, #1
 ; NEON7-NEXT:    pop {r11, pc}
 ;
@@ -755,14 +755,14 @@ define i1 @test_urem_larger(i63 %X) nounwind {
 ; NEON8-NEXT:    mla r0, r1, r12, r0
 ; NEON8-NEXT:    bic r0, r0, #-2147483648
 ; NEON8-NEXT:    lsrs r0, r0, #1
-; NEON8-NEXT:    rrx r1, r3
+; NEON8-NEXT:    rrx r2, r3
 ; NEON8-NEXT:    orr r0, r0, r3, lsl #30
 ; NEON8-NEXT:    movw r3, #24026
-; NEON8-NEXT:    bic r2, r0, #-2147483648
+; NEON8-NEXT:    bic r1, r0, #-2147483648
 ; NEON8-NEXT:    movt r3, #48461
-; NEON8-NEXT:    subs r1, r1, r3
+; NEON8-NEXT:    subs r2, r2, r3
 ; NEON8-NEXT:    mov r0, #0
-; NEON8-NEXT:    sbcs r1, r2, #1
+; NEON8-NEXT:    sbcs r1, r1, #1
 ; NEON8-NEXT:    movwlo r0, #1
 ; NEON8-NEXT:    pop {r11, pc}
   %urem = urem i63 %X, 1234567890

topperc · 2024-11-17T18:35:58Z

llvm/lib/Target/ARM/ARMInstrInfo.td

-def ARMsrl_glue      : SDNode<"ARMISD::SRL_GLUE", SDTIntUnaryOp, [SDNPOutGlue]>;
-def ARMsra_glue      : SDNode<"ARMISD::SRA_GLUE", SDTIntUnaryOp, [SDNPOutGlue]>;
-def ARMrrx           : SDNode<"ARMISD::RRX"     , SDTIntUnaryOp, [SDNPInGlue ]>;
+def ARMsrl_glue : SDNode<"ARMISD::SRL_GLUE", SDTIntUnaryOpWithFlagsOut>;


Drop glue from the name?

I will try to. It can easily be dropped from SDNode's name. Changing the names of the corresponding instructions requires corresponding changes in other parts of the backend and may require changes in scheduling model regexes etc.

…efore

statham-arm · 2024-11-18T17:10:58Z

Can you expand the commit message to explain the benefit of this change? It says what you're doing, but expects the reader to already understand why.

I haven't needed to get involved with "glue" in SelectionDAG before. But it looks as if the idea is that now when we generate an LSRS + RRX pair, they're tied together by a precise data flow dependency saying the second instruction uses the CPSR written by the first, whereas previously the "glue" system said something more approximate like "just don't separate this pair of instructions"? So now you could imagine a backend pass separating the two instructions, if something else could safely be put in between.

But I don't see any examples in the changed tests where that's actually happened. All the test changes look as if they've just perturbed the register allocation.

Is this change intended to improve performance? Or is there some other purpose?

s-barannikov · 2024-11-18T18:17:25Z

@statham-arm I've updated the description, PTAL.

But I don't see any examples in the changed tests where that's actually happened. All the test changes look as if they've just perturbed the register allocation.

This is probably due to 1-bit shift optimization not triggering often on existing tests. In the next patch I change FP comparisons to use data flow too, and there is a significant difference. Here is just a small example:

I didn't submit it yet because it depends on this PR.

s-barannikov · 2024-11-18T18:43:21Z

I didn't submit it yet because it depends on this PR.

#116676

statham-arm

Thanks – that's a great explanation and I'm glad I asked! I'd guessed your point 1 (scheduling flexibility), but hadn't spotted point 2 at all (eliminating duplication). So this actually gives more benefit than I'd realised.

LGTM, although I've spotted one last tiny nit.

statham-arm · 2024-11-19T11:31:08Z

llvm/lib/Target/ARM/ARMScheduleM7.td

@@ -325,7 +325,7 @@ def M7Ex1ReadNoFastBypass : SchedReadAdvance<-1, [WriteLd, M7LoadLatency1]>;
 def : InstRW<[WriteALUsi, M7Ex1ReadNoFastBypass, M7Read_ISS],
             (instregex "t2(ADC|ADDS|ADD|BIC|EOR|ORN|ORR|RSBS|RSB|SBC|SUBS)rs$",
                        "t2(SUB|CMP|CMNz|TEQ|TST)rs$",
-                        "t2MOVsr(a|l)")>;
+                        "t2(A|L)SRs1")>;


Tiny nit: should this regex end in $, to match the two above it? I don't think you're intentionally matching other opcodes that start with t2ASRs1 or t2LSRs1. (Whereas the previous code was matching things that had extra stuff after t2MOVsra/t2MOVsrl.

Following #116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. There might be a case when a copy can't be avoided (although not found in existing tests). If a copy is necessary, the virtual register will be created with `cl_FPSCR_NZCV` register class. If this register class is inappropriate, `TRI::getCrossCopyRegClass` should be modified to return the correct class. Pull Request: #116676

Following llvm#116547 and llvm#116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.

[ARM] Stop gluing FP comparisons to FMSTAT (llvm#116676) Following llvm#116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. `TRI::getCrossCopyRegClass` is modified in a way that prevents DAG scheduler from copying FPSCR into a virtual register. The register allocator might need to spill the virtual register, but that only seem to work in Thumb mode.

Following #116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. `TRI::getCrossCopyRegClass` is modified in a way that prevents DAG scheduler from copying FPSCR into a virtual register. The register allocator might need to spill the virtual register, but that only seem to work in Thumb mode.

Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change. Pull Request: #116970

Re-landing after fixing miscompilation error. The original change made it possible for CMPZ to have multiple uses; `ARMDAGToDAGISel::SelectCMPZ` was not prepared for this. Original commit message: Following llvm#116547 and llvm#116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.

Re-landing #116970 after fixing miscompilation error. The original change made it possible for CMPZ to have multiple uses; `ARMDAGToDAGISel::SelectCMPZ` was not prepared for this. Pull Request: #118887 Original commit message: Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.

[ARM] Stop gluing 1-bit shifts

637ab08

Use normal data flow instead. There are several more nodes that are still glued, I'll try to change that in subsequent patches.

s-barannikov requested review from davemgreen and topperc November 17, 2024 16:21

llvmbot added the backend:ARM label Nov 17, 2024

Merge branch 'main' into sdag/arm/glue/shift

e3725dd

topperc reviewed Nov 17, 2024

View reviewed changes

Try to get the naming right

776a241

s-barannikov requested a review from topperc November 17, 2024 20:25

Fix comment & indentation

c363153

s-barannikov force-pushed the sdag/arm/glue/shift branch from 21332fb to c363153 Compare November 17, 2024 20:32

Merge branch 'main' into sdag/arm/glue/shift

dc8d1d9

s-barannikov requested a review from statham-arm November 18, 2024 14:37

Adjust schedmodel regex so that tROR (without suffix) is matched as b…

6a55533

…efore

s-barannikov requested a review from deadalnix November 18, 2024 18:28

s-barannikov mentioned this pull request Nov 18, 2024

[ARM] Stop gluing FP comparisons to FMSTAT #116676

Merged

statham-arm approved these changes Nov 19, 2024

View reviewed changes

Add missing $ to regex

c9dab24

s-barannikov merged commit aff98e4 into llvm:main Nov 19, 2024
6 of 7 checks passed

s-barannikov deleted the sdag/arm/glue/shift branch November 19, 2024 14:46

s-barannikov mentioned this pull request Nov 20, 2024

[ARM] Stop gluing ALU nodes to branches / selects #116970

Merged

s-barannikov mentioned this pull request Nov 21, 2024

Reland "[ARM] Stop gluing FP comparisons to FMSTAT" #117248

Merged

s-barannikov mentioned this pull request Dec 5, 2024

Reland "[ARM] Stop gluing ALU nodes to branches / selects" #118887

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ARM] Stop gluing 1-bit shifts #116547

[ARM] Stop gluing 1-bit shifts #116547

Uh oh!

s-barannikov commented Nov 17, 2024 •

edited

Loading

Uh oh!

llvmbot commented Nov 17, 2024

Uh oh!

topperc Nov 17, 2024

Uh oh!

s-barannikov Nov 17, 2024

Uh oh!

statham-arm commented Nov 18, 2024

Uh oh!

s-barannikov commented Nov 18, 2024

Uh oh!

s-barannikov commented Nov 18, 2024

Uh oh!

statham-arm left a comment

Uh oh!

statham-arm Nov 19, 2024

Uh oh!

Uh oh!

Uh oh!

[ARM] Stop gluing 1-bit shifts #116547

[ARM] Stop gluing 1-bit shifts #116547

Uh oh!

Conversation

s-barannikov commented Nov 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 17, 2024

Uh oh!

topperc Nov 17, 2024

Choose a reason for hiding this comment

Uh oh!

s-barannikov Nov 17, 2024

Choose a reason for hiding this comment

Uh oh!

statham-arm commented Nov 18, 2024

Uh oh!

s-barannikov commented Nov 18, 2024

Uh oh!

s-barannikov commented Nov 18, 2024

Uh oh!

statham-arm left a comment

Choose a reason for hiding this comment

Uh oh!

statham-arm Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

s-barannikov commented Nov 17, 2024 •

edited

Loading