[AMDGPU][GlobalISel] illegal VGPR to SGPR copy #61468

chenzheng1030 · 2023-03-17T09:25:22Z

Still from https://reviews.llvm.org/D141247, some unexpected changes:

td changes:

diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index c10bbe7367a1..9117710a1aa9 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -2011,13 +2011,13 @@ def : GCNPat <
 def : GCNPat <
   (i32 (sext i1:$src0)),
   (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
-                     /*src1mod*/(i32 0), /*src1*/(i32 -1), $src0)
+                     /*src1mod*/(i32 0), /*src1*/(i32 -1), i1:$src0)
 >;

 class Ext32Pat <SDNode ext> : GCNPat <
   (i32 (ext i1:$src0)),
   (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
-                     /*src1mod*/(i32 0), /*src1*/(i32 1), $src0)
+                     /*src1mod*/(i32 0), /*src1*/(i32 1), i1:$src0)
 >;

 def : Ext32Pat <zext>;

Get some lit failures and some of them are not just code gen difference, for example for below case s_ssubsat_i128 in file CodeGen/AMDGPU/GlobalISel/ssubsat.ll,

define amdgpu_ps i128 @s_ssubsat_i128(i128 inreg %lhs, i128 inreg %rhs) {
  %result = call i128 @llvm.ssub.sat.i128(i128 %lhs, i128 %rhs)
  ret i128 %result
}
declare i128 @llvm.ssub.sat.i128(i128, i128) #0

$ ./bin/llc ssubsat.ll  -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=tahiti
error: <unknown>:0:0: in function s_ssubsat_i128 i128 (i128, i128): illegal VGPR to SGPR copy

error: <unknown>:0:0: in function s_ssubsat_i128 i128 (i128, i128): illegal VGPR to SGPR copy

The text was updated successfully, but these errors were encountered:

llvmbot · 2023-03-17T09:25:54Z

@llvm/issue-subscribers-backend-amdgpu

chenzheng1030 · 2023-03-17T09:26:40Z

FYI @arsenm @Pierre-vh

arsenm · 2023-03-19T15:08:48Z

This is a situation where the register class cannot be inferred from the type and is ambiguous. This is a bad pattern change, which ideally tablegen would error on. This really needs to use an explicit register class. Bool values in particular are tricky

chenzheng1030 · 2023-03-21T06:35:46Z

Thanks. What register class should be set here for the $src0? I tried below change(Copied from other patterns for V_CNDMASK_B32_e64), but seems not work.

diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index c10bbe7367a1..af6316a95629 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -2011,13 +2011,13 @@ def : GCNPat <
 def : GCNPat <
   (i32 (sext i1:$src0)),
   (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
-                     /*src1mod*/(i32 0), /*src1*/(i32 -1), $src0)
+                     /*src1mod*/(i32 0), /*src1*/(i32 -1), SSrc_i1:$src0)
 >;

 class Ext32Pat <SDNode ext> : GCNPat <
   (i32 (ext i1:$src0)),
   (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
-                     /*src1mod*/(i32 0), /*src1*/(i32 1), $src0)
+                     /*src1mod*/(i32 0), /*src1*/(i32 1), SSrc_i1:$src0)
 >;

 def : Ext32Pat <zext>;

Illegal copy is still generated from below pattern:

%68:vgpr(s16) = G_SELECT %64:vcc(s1), %66:vgpr, %67:vgpr
%58:vgpr(s1) = G_TRUNC %68:vgpr(s16)
%130:vgpr(s32) = G_ANYEXT %58:vgpr(s1)
%132:vgpr(s32) = G_XOR %130:vgpr, %131:vgpr

------>

%68:vgpr_32 = V_CNDMASK_B32_e64 0, %129:vgpr_32, 0, %126:vgpr_32, %64:sreg_64_xexec, implicit $exec
%159:sreg_64_xexec = COPY %68:vgpr_32  //This is the illegal copy
%130:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, %159:sreg_64_xexec, implicit $exec
%132:vgpr_32 = V_XOR_B32_e64 %130:vgpr_32, %131:vgpr_32, implicit $exec

%159:sreg_64_xexec = COPY %68:vgpr_32 seems for %58:vgpr(s1) = G_TRUNC %68:vgpr(s16), but I tried a simple truncate case from i16 to i1 for amdgcn-amd-amdpal, global isel works good.

arsenm · 2023-03-24T15:59:49Z

Thanks. What register class should be set here for the $src0? I tried below change(Copied from other patterns for V_CNDMASK_B32_e64), but seems not work.

It depends on the wave size. It's either SReg_32_XEXEC or SReg_64_XEXEC based on the wavesize, so we may need to duplicate the patterns. SSrc_i1 is SReg_1_XEXEC, which is the union of the two and unallocatable

chenzheng1030 · 2023-03-27T03:45:36Z

Thank you @arsenm , I did the following change, but seems the above simple case still gets illegal copy result.

diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index c10bbe7367a1..39e6846dbd05 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -2008,20 +2008,39 @@ def : GCNPat <
   (V_EXP_F32_e64 SRCMODS.NONE, (V_MUL_LEGACY_F32_e64 $src1_mods, $src1, SRCMODS.NONE, (V_LOG_F32_e64 $src0_mods, $src0), 0, 0))
 >;

+let WaveSizePredicate = isWave32 in {
+def : GCNPat <
+  (i32 (sext i1:$src0)),
+  (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
+                     /*src1mod*/(i32 0), /*src1*/(i32 -1), SReg_32_XEXEC:$src0)
+>;
+
+class Ext32Pat_B32 <SDNode ext> : GCNPat <
+  (i32 (ext i1:$src0)),
+  (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
+                     /*src1mod*/(i32 0), /*src1*/(i32 1), SReg_32_XEXEC:$src0)
+>;
+
+def : Ext32Pat_B32 <zext>;
+def : Ext32Pat_B32 <anyext>;
+}
+
+let WaveSizePredicate = isWave64 in {
 def : GCNPat <
   (i32 (sext i1:$src0)),
   (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
-                     /*src1mod*/(i32 0), /*src1*/(i32 -1), $src0)
+                     /*src1mod*/(i32 0), /*src1*/(i32 -1), SReg_64_XEXEC:$src0)
 >;

-class Ext32Pat <SDNode ext> : GCNPat <
+class Ext32Pat_B64 <SDNode ext> : GCNPat <
   (i32 (ext i1:$src0)),
   (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
-                     /*src1mod*/(i32 0), /*src1*/(i32 1), $src0)
+                     /*src1mod*/(i32 0), /*src1*/(i32 1), SReg_64_XEXEC:$src0)
 >;

-def : Ext32Pat <zext>;
-def : Ext32Pat <anyext>;
+def : Ext32Pat_B64 <zext>;
+def : Ext32Pat_B64 <anyext>;
+}

 // The multiplication scales from [0,1) to the unsigned integer range,
 // rounding down a bit to avoid unwanted overflow.

Even with the RC, it is still a illegal copy.

%68:vgpr_32 = V_CNDMASK_B32_e64 0, %129:vgpr_32, 0, %126:vgpr_32, %64:sreg_64_xexec, implicit $exec
%159:sreg_64_xexec = COPY %68:vgpr_32
%130:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, %159:sreg_64_xexec, implicit $exec

Seems for the case, RC SSrc_i1 and SReg_64_XEXEC have same result. Am I wrong here for the change?

arsenm · 2023-04-03T13:26:10Z

My current guess is there is a copy from VGPR to VCC that is just being incorrectly selected. I think this pattern should be right. Was the copy inserted during selection or already present?

chenzheng1030 · 2023-04-03T13:50:00Z

The COPY is inserted in global-isel instruction selection phase.
Before regbank selection:

%78:_(s16) = G_SELECT %74:_(s1), %76:_, %77:_
%57:_(s1) = G_TRUNC %78:_(s16)
%68:_(s16) = G_SELECT %64:_(s1), %66:_, %67:_
%58:_(s1) = G_TRUNC %68:_(s16)

%21:_(s1) = G_XOR %58:_, %57:_

After regbank selection:

%78:vgpr(s16) = G_SELECT %74:vcc(s1), %76:vgpr, %77:vgpr
%57:vgpr(s1) = G_TRUNC %78:vgpr(s16)
%68:vgpr(s16) = G_SELECT %64:vcc(s1), %66:vgpr, %67:vgpr
%58:vgpr(s1) = G_TRUNC %68:vgpr(s16)
%130:vgpr(s32) = G_ANYEXT %58:vgpr(s1)
%131:vgpr(s32) = G_ANYEXT %57:vgpr(s1)

%132:vgpr(s32) = G_XOR %130:vgpr, %131:vgpr

After instruction selection:

%78:vgpr_32 = V_CNDMASK_B32_e64 0, %117:vgpr_32, 0, %114:vgpr_32, %74:sreg_64_xexec, implicit $exec
%68:vgpr_32 = V_CNDMASK_B32_e64 0, %129:vgpr_32, 0, %126:vgpr_32, %64:sreg_64_xexec, implicit $exec
%159:sreg_64_xexec = COPY %68:vgpr_32
%130:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, %159:sreg_64_xexec, implicit $exec
%158:sreg_64_xexec = COPY %78:vgpr_32
%131:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, %158:sreg_64_xexec, implicit $exec

%132:vgpr_32 = V_XOR_B32_e64 %130:vgpr_32, %131:vgpr_32, implicit $exec

arsenm · 2023-04-04T14:20:51Z

RegBankSelect should not have produced %58:vgpr(s1) = G_TRUNC %68:vgpr(s16). We do not want to see s1 VGPR assignments

chenzheng1030 · 2023-04-07T09:32:27Z

A reduced case:

cat t1.ll
define i1 @foo(i1 %cond, i1 %i1, i1 %i2, i1 %xor2)
{
entry:
  %select = select i1 %cond, i1 %i1, i1 %i2
  %r = xor i1 %select, %xor2
  ret i1 %r
}

Before Instruction Selection:

  %9:vgpr(s1) = G_TRUNC %17:vgpr(s32)
  %10:vgpr(s32) = G_ANYEXT %9:vgpr(s1)

During Instruction Selection:

Selecting:
  %10:vgpr_32(s32) = G_ANYEXT %9:vgpr(s1)
Into:
  %18:sreg_64_xexec = COPY %9:vgpr(s1) <<< COPY1, from VGPR to SGPR
  %10:vgpr_32(s32) = V_CNDMASK_B32_e64 0, 0, 0, 1, %18:sreg_64_xexec, implicit $exec

Selecting:
  %9:vgpr(s1) = G_TRUNC %17:vgpr(s32)
Into:
  %9:vgpr_32(s1) = COPY %17:vgpr_32(s32) <<<< COPY2 and COPY1 are folded into 1

So after Instruction Selection:

  %18:sreg_64_xexec = COPY %17:vgpr_32 <<< This COPY is the combination of COPY1 and COPY2 and the problematic VGPR to SGPR COPY
  %10:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, %18:sreg_64_xexec, implicit $exec

So this issue only happens when we change the td pattern. And the VGPR to SGPR COPY is generated in the instruction selection phase by the Table-gen selection framework.

chenzheng1030 · 2023-04-07T09:44:27Z

DAG-ISEL selects %9:vgpr(s1) = G_TRUNC %17:vgpr(s32) like:

ISEL: Starting selection on root node: t3: i1 = truncate # D:1 t2
Creating new machine node: t25: i16 = V_AND_B32_e64 # D:1 TargetConstant:i32<1>, t2
  Created node: t25: i16 = V_AND_B32_e64 # D:1 TargetConstant:i32<1>, t2
  Morphed node: t3: i1 = V_CMP_EQ_U32_e64 # D:1 t25, TargetConstant:i32<1>

DAG pattern:

def : GCNPat <
  (i1 (DivergentUnaryFrag<trunc> i32:$a)),
  (V_CMP_EQ_U32_e64 (V_AND_B32_e64 (i32 1), $a), (i32 1))
>;

So I tried to select the G_TRUNC like above in AMDGPUInstructionSelector.cpp and force the RC of the output to SGPR (TRI.getBoolRC()). It can make the reduced case in above comment pass. However, we get some other cases wrong in LIT test, like:

*** Bad machine code: Illegal virtual register for instruction ***
- function:    zext_vgpr_s1_to_vgpr_s16
- basic block: %bb.0  (0x1003732cfd0)
- instruction: %2:vgpr_32 = V_AND_B32_e32 1, %1:sreg_64, implicit $exec
- operand 2:   %1:sreg_64
Expected a VGPR_32 register, but got a SReg_64 register

I think this is expected, because in my change, I force the VGPR to SGPR for our motivated case, but for other cases which requires VGPR(s1), after the change, the RC will not be right. So this method must be wrong.

Code change I made in AMDGPUInstructionSelector.cpp

@@ -2217,6 +2226,21 @@ bool AMDGPUInstructionSelector::selectG_TRUNC(MachineInstr &I) const {
+  } else if (DstTy == S1 && IsVALU) {
+    // The Dst S1 may be used as a COPY to SReg_X_XEXEC, to avoid illegal copy
+    // from VGPR to SGPR, need to change the DstRC here. This aligns with
+    // DAG-ISel.
+    unsigned AndOpc = AMDGPU::V_AND_B32_e64;
+    unsigned CmpOpc = AMDGPU::V_CMP_EQ_U32_e64;
+    Register AndReg = MRI->createVirtualRegister(SrcRC);
+
+    BuildMI(*MBB, I, DL, TII.get(AndOpc), AndReg).addImm(1).addReg(SrcReg);
+    BuildMI(*MBB, I, DL, TII.get(CmpOpc), DstReg).addReg(AndReg).addImm(1);
+
+    MRI->setRegClass(DstReg, TRI.getBoolRC());
+
+    I.eraseFromParent();
+    return true;

chenzheng1030 · 2023-04-07T09:51:29Z

I also tried to use VReg_1 register class for the i1 input in the changed patterns (Added back the change in https://reviews.llvm.org/D119552), but get an error, saying Not a known SIRegisterClass. I am not digging into this way, I am thinking maybe VReg_1 is not right register class for the operand of V_CNDMASK_B32_e64 instruction?

I don't find any other valid register class for i1 type in the SI register info file.

chenzheng1030 · 2023-04-07T10:19:16Z

The td patterns should be right, I see they are correctly used in DAG-ISEL in many LIT cases.

To fix this, my current plan is to avoid these several extension patterns to be selected in table-gen selection in AMDGPUInstructionSelector::select(). I am not sure if there is way to stop inside selectImpl() for cases like trying to add a VGPR to SGPR COPY

chenzheng1030 · 2023-04-07T10:35:02Z

I sent https://reviews.llvm.org/D147780 for this issue.

However the imported rules can not be used for now because Global ISel selectImpl() seems has some bug/limitation to create a illegl COPY from VGPR to SGPR. So currently workaround this by not auto selecting these patterns. Fixes llvm#61468 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147780

chenzheng1030 added backend:AMDGPU llvm:globalisel labels Mar 17, 2023

chenzheng1030 self-assigned this Apr 7, 2023

chenzheng1030 mentioned this issue Apr 10, 2023

[AMDGPU][GlobalISel]selectImpl() may create illegal copy from VGPR to SGPR #62035

Open

chenzheng1030 closed this as completed in a3d5ec5 Apr 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU][GlobalISel] illegal VGPR to SGPR copy #61468

[AMDGPU][GlobalISel] illegal VGPR to SGPR copy #61468

chenzheng1030 commented Mar 17, 2023 •

edited by VoltrexKeyva

Loading

llvmbot commented Mar 17, 2023

chenzheng1030 commented Mar 17, 2023

arsenm commented Mar 19, 2023

chenzheng1030 commented Mar 21, 2023

arsenm commented Mar 24, 2023

chenzheng1030 commented Mar 27, 2023

arsenm commented Apr 3, 2023

chenzheng1030 commented Apr 3, 2023

arsenm commented Apr 4, 2023

chenzheng1030 commented Apr 7, 2023 •

edited

Loading

chenzheng1030 commented Apr 7, 2023 •

edited

Loading

chenzheng1030 commented Apr 7, 2023 •

edited

Loading

chenzheng1030 commented Apr 7, 2023 •

edited

Loading

chenzheng1030 commented Apr 7, 2023

[AMDGPU][GlobalISel] illegal VGPR to SGPR copy #61468

[AMDGPU][GlobalISel] illegal VGPR to SGPR copy #61468

Comments

chenzheng1030 commented Mar 17, 2023 • edited by VoltrexKeyva Loading

llvmbot commented Mar 17, 2023

chenzheng1030 commented Mar 17, 2023

arsenm commented Mar 19, 2023

chenzheng1030 commented Mar 21, 2023

arsenm commented Mar 24, 2023

chenzheng1030 commented Mar 27, 2023

arsenm commented Apr 3, 2023

chenzheng1030 commented Apr 3, 2023

arsenm commented Apr 4, 2023

chenzheng1030 commented Apr 7, 2023 • edited Loading

chenzheng1030 commented Apr 7, 2023 • edited Loading

chenzheng1030 commented Apr 7, 2023 • edited Loading

chenzheng1030 commented Apr 7, 2023 • edited Loading

chenzheng1030 commented Apr 7, 2023

chenzheng1030 commented Mar 17, 2023 •

edited by VoltrexKeyva

Loading

chenzheng1030 commented Apr 7, 2023 •

edited

Loading

chenzheng1030 commented Apr 7, 2023 •

edited

Loading

chenzheng1030 commented Apr 7, 2023 •

edited

Loading

chenzheng1030 commented Apr 7, 2023 •

edited

Loading