Skip to content

[AMDGPU][GlobalISel] illegal VGPR to SGPR copy #61468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chenzheng1030 opened this issue Mar 17, 2023 · 14 comments
Closed

[AMDGPU][GlobalISel] illegal VGPR to SGPR copy #61468

chenzheng1030 opened this issue Mar 17, 2023 · 14 comments

Comments

@chenzheng1030
Copy link
Collaborator

chenzheng1030 commented Mar 17, 2023

Still from https://reviews.llvm.org/D141247, some unexpected changes:

td changes:

diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index c10bbe7367a1..9117710a1aa9 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -2011,13 +2011,13 @@ def : GCNPat <
 def : GCNPat <
   (i32 (sext i1:$src0)),
   (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
-                     /*src1mod*/(i32 0), /*src1*/(i32 -1), $src0)
+                     /*src1mod*/(i32 0), /*src1*/(i32 -1), i1:$src0)
 >;

 class Ext32Pat <SDNode ext> : GCNPat <
   (i32 (ext i1:$src0)),
   (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
-                     /*src1mod*/(i32 0), /*src1*/(i32 1), $src0)
+                     /*src1mod*/(i32 0), /*src1*/(i32 1), i1:$src0)
 >;

 def : Ext32Pat <zext>;

Get some lit failures and some of them are not just code gen difference, for example for below case s_ssubsat_i128 in file CodeGen/AMDGPU/GlobalISel/ssubsat.ll,

define amdgpu_ps i128 @s_ssubsat_i128(i128 inreg %lhs, i128 inreg %rhs) {
  %result = call i128 @llvm.ssub.sat.i128(i128 %lhs, i128 %rhs)
  ret i128 %result
}
declare i128 @llvm.ssub.sat.i128(i128, i128) #0
$ ./bin/llc ssubsat.ll  -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=tahiti
error: <unknown>:0:0: in function s_ssubsat_i128 i128 (i128, i128): illegal VGPR to SGPR copy

error: <unknown>:0:0: in function s_ssubsat_i128 i128 (i128, i128): illegal VGPR to SGPR copy
@llvmbot
Copy link
Member

llvmbot commented Mar 17, 2023

@llvm/issue-subscribers-backend-amdgpu

@chenzheng1030
Copy link
Collaborator Author

FYI @arsenm @Pierre-vh

@arsenm
Copy link
Contributor

arsenm commented Mar 19, 2023

This is a situation where the register class cannot be inferred from the type and is ambiguous. This is a bad pattern change, which ideally tablegen would error on. This really needs to use an explicit register class. Bool values in particular are tricky

@chenzheng1030
Copy link
Collaborator Author

Thanks. What register class should be set here for the $src0? I tried below change(Copied from other patterns for V_CNDMASK_B32_e64), but seems not work.

diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index c10bbe7367a1..af6316a95629 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -2011,13 +2011,13 @@ def : GCNPat <
 def : GCNPat <
   (i32 (sext i1:$src0)),
   (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
-                     /*src1mod*/(i32 0), /*src1*/(i32 -1), $src0)
+                     /*src1mod*/(i32 0), /*src1*/(i32 -1), SSrc_i1:$src0)
 >;

 class Ext32Pat <SDNode ext> : GCNPat <
   (i32 (ext i1:$src0)),
   (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
-                     /*src1mod*/(i32 0), /*src1*/(i32 1), $src0)
+                     /*src1mod*/(i32 0), /*src1*/(i32 1), SSrc_i1:$src0)
 >;

 def : Ext32Pat <zext>;

Illegal copy is still generated from below pattern:

%68:vgpr(s16) = G_SELECT %64:vcc(s1), %66:vgpr, %67:vgpr
%58:vgpr(s1) = G_TRUNC %68:vgpr(s16)
%130:vgpr(s32) = G_ANYEXT %58:vgpr(s1)
%132:vgpr(s32) = G_XOR %130:vgpr, %131:vgpr

------>

%68:vgpr_32 = V_CNDMASK_B32_e64 0, %129:vgpr_32, 0, %126:vgpr_32, %64:sreg_64_xexec, implicit $exec
%159:sreg_64_xexec = COPY %68:vgpr_32  //This is the illegal copy
%130:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, %159:sreg_64_xexec, implicit $exec
%132:vgpr_32 = V_XOR_B32_e64 %130:vgpr_32, %131:vgpr_32, implicit $exec

%159:sreg_64_xexec = COPY %68:vgpr_32 seems for %58:vgpr(s1) = G_TRUNC %68:vgpr(s16), but I tried a simple truncate case from i16 to i1 for amdgcn-amd-amdpal, global isel works good.

@arsenm
Copy link
Contributor

arsenm commented Mar 24, 2023

Thanks. What register class should be set here for the $src0? I tried below change(Copied from other patterns for V_CNDMASK_B32_e64), but seems not work.

It depends on the wave size. It's either SReg_32_XEXEC or SReg_64_XEXEC based on the wavesize, so we may need to duplicate the patterns. SSrc_i1 is SReg_1_XEXEC, which is the union of the two and unallocatable

@chenzheng1030
Copy link
Collaborator Author

Thank you @arsenm , I did the following change, but seems the above simple case still gets illegal copy result.

diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index c10bbe7367a1..39e6846dbd05 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -2008,20 +2008,39 @@ def : GCNPat <
   (V_EXP_F32_e64 SRCMODS.NONE, (V_MUL_LEGACY_F32_e64 $src1_mods, $src1, SRCMODS.NONE, (V_LOG_F32_e64 $src0_mods, $src0), 0, 0))
 >;

+let WaveSizePredicate = isWave32 in {
+def : GCNPat <
+  (i32 (sext i1:$src0)),
+  (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
+                     /*src1mod*/(i32 0), /*src1*/(i32 -1), SReg_32_XEXEC:$src0)
+>;
+
+class Ext32Pat_B32 <SDNode ext> : GCNPat <
+  (i32 (ext i1:$src0)),
+  (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
+                     /*src1mod*/(i32 0), /*src1*/(i32 1), SReg_32_XEXEC:$src0)
+>;
+
+def : Ext32Pat_B32 <zext>;
+def : Ext32Pat_B32 <anyext>;
+}
+
+let WaveSizePredicate = isWave64 in {
 def : GCNPat <
   (i32 (sext i1:$src0)),
   (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
-                     /*src1mod*/(i32 0), /*src1*/(i32 -1), $src0)
+                     /*src1mod*/(i32 0), /*src1*/(i32 -1), SReg_64_XEXEC:$src0)
 >;

-class Ext32Pat <SDNode ext> : GCNPat <
+class Ext32Pat_B64 <SDNode ext> : GCNPat <
   (i32 (ext i1:$src0)),
   (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
-                     /*src1mod*/(i32 0), /*src1*/(i32 1), $src0)
+                     /*src1mod*/(i32 0), /*src1*/(i32 1), SReg_64_XEXEC:$src0)
 >;

-def : Ext32Pat <zext>;
-def : Ext32Pat <anyext>;
+def : Ext32Pat_B64 <zext>;
+def : Ext32Pat_B64 <anyext>;
+}

 // The multiplication scales from [0,1) to the unsigned integer range,
 // rounding down a bit to avoid unwanted overflow.

Even with the RC, it is still a illegal copy.

%68:vgpr_32 = V_CNDMASK_B32_e64 0, %129:vgpr_32, 0, %126:vgpr_32, %64:sreg_64_xexec, implicit $exec
%159:sreg_64_xexec = COPY %68:vgpr_32
%130:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, %159:sreg_64_xexec, implicit $exec

Seems for the case, RC SSrc_i1 and SReg_64_XEXEC have same result. Am I wrong here for the change?

@arsenm
Copy link
Contributor

arsenm commented Apr 3, 2023

My current guess is there is a copy from VGPR to VCC that is just being incorrectly selected. I think this pattern should be right. Was the copy inserted during selection or already present?

@chenzheng1030
Copy link
Collaborator Author

The COPY is inserted in global-isel instruction selection phase.
Before regbank selection:

%78:_(s16) = G_SELECT %74:_(s1), %76:_, %77:_
%57:_(s1) = G_TRUNC %78:_(s16)
%68:_(s16) = G_SELECT %64:_(s1), %66:_, %67:_
%58:_(s1) = G_TRUNC %68:_(s16)

%21:_(s1) = G_XOR %58:_, %57:_

After regbank selection:

%78:vgpr(s16) = G_SELECT %74:vcc(s1), %76:vgpr, %77:vgpr
%57:vgpr(s1) = G_TRUNC %78:vgpr(s16)
%68:vgpr(s16) = G_SELECT %64:vcc(s1), %66:vgpr, %67:vgpr
%58:vgpr(s1) = G_TRUNC %68:vgpr(s16)
%130:vgpr(s32) = G_ANYEXT %58:vgpr(s1)
%131:vgpr(s32) = G_ANYEXT %57:vgpr(s1)

%132:vgpr(s32) = G_XOR %130:vgpr, %131:vgpr

After instruction selection:

%78:vgpr_32 = V_CNDMASK_B32_e64 0, %117:vgpr_32, 0, %114:vgpr_32, %74:sreg_64_xexec, implicit $exec
%68:vgpr_32 = V_CNDMASK_B32_e64 0, %129:vgpr_32, 0, %126:vgpr_32, %64:sreg_64_xexec, implicit $exec
%159:sreg_64_xexec = COPY %68:vgpr_32
%130:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, %159:sreg_64_xexec, implicit $exec
%158:sreg_64_xexec = COPY %78:vgpr_32
%131:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, %158:sreg_64_xexec, implicit $exec

%132:vgpr_32 = V_XOR_B32_e64 %130:vgpr_32, %131:vgpr_32, implicit $exec

@arsenm
Copy link
Contributor

arsenm commented Apr 4, 2023

RegBankSelect should not have produced %58:vgpr(s1) = G_TRUNC %68:vgpr(s16). We do not want to see s1 VGPR assignments

@chenzheng1030
Copy link
Collaborator Author

chenzheng1030 commented Apr 7, 2023

A reduced case:

cat t1.ll
define i1 @foo(i1 %cond, i1 %i1, i1 %i2, i1 %xor2)
{
entry:
  %select = select i1 %cond, i1 %i1, i1 %i2
  %r = xor i1 %select, %xor2
  ret i1 %r
}

Before Instruction Selection:

  %9:vgpr(s1) = G_TRUNC %17:vgpr(s32)
  %10:vgpr(s32) = G_ANYEXT %9:vgpr(s1)

During Instruction Selection:

Selecting:
  %10:vgpr_32(s32) = G_ANYEXT %9:vgpr(s1)
Into:
  %18:sreg_64_xexec = COPY %9:vgpr(s1) <<< COPY1, from VGPR to SGPR
  %10:vgpr_32(s32) = V_CNDMASK_B32_e64 0, 0, 0, 1, %18:sreg_64_xexec, implicit $exec

Selecting:
  %9:vgpr(s1) = G_TRUNC %17:vgpr(s32)
Into:
  %9:vgpr_32(s1) = COPY %17:vgpr_32(s32) <<<< COPY2 and COPY1 are folded into 1

So after Instruction Selection:

  %18:sreg_64_xexec = COPY %17:vgpr_32 <<< This COPY is the combination of COPY1 and COPY2 and the problematic VGPR to SGPR COPY
  %10:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, %18:sreg_64_xexec, implicit $exec

So this issue only happens when we change the td pattern. And the VGPR to SGPR COPY is generated in the instruction selection phase by the Table-gen selection framework.

@chenzheng1030
Copy link
Collaborator Author

chenzheng1030 commented Apr 7, 2023

DAG-ISEL selects %9:vgpr(s1) = G_TRUNC %17:vgpr(s32) like:

ISEL: Starting selection on root node: t3: i1 = truncate # D:1 t2
Creating new machine node: t25: i16 = V_AND_B32_e64 # D:1 TargetConstant:i32<1>, t2
  Created node: t25: i16 = V_AND_B32_e64 # D:1 TargetConstant:i32<1>, t2
  Morphed node: t3: i1 = V_CMP_EQ_U32_e64 # D:1 t25, TargetConstant:i32<1>

DAG pattern:

def : GCNPat <
  (i1 (DivergentUnaryFrag<trunc> i32:$a)),
  (V_CMP_EQ_U32_e64 (V_AND_B32_e64 (i32 1), $a), (i32 1))
>;

So I tried to select the G_TRUNC like above in AMDGPUInstructionSelector.cpp and force the RC of the output to SGPR (TRI.getBoolRC()). It can make the reduced case in above comment pass. However, we get some other cases wrong in LIT test, like:

*** Bad machine code: Illegal virtual register for instruction ***
- function:    zext_vgpr_s1_to_vgpr_s16
- basic block: %bb.0  (0x1003732cfd0)
- instruction: %2:vgpr_32 = V_AND_B32_e32 1, %1:sreg_64, implicit $exec
- operand 2:   %1:sreg_64
Expected a VGPR_32 register, but got a SReg_64 register

I think this is expected, because in my change, I force the VGPR to SGPR for our motivated case, but for other cases which requires VGPR(s1), after the change, the RC will not be right. So this method must be wrong.

Code change I made in AMDGPUInstructionSelector.cpp

@@ -2217,6 +2226,21 @@ bool AMDGPUInstructionSelector::selectG_TRUNC(MachineInstr &I) const {
+  } else if (DstTy == S1 && IsVALU) {
+    // The Dst S1 may be used as a COPY to SReg_X_XEXEC, to avoid illegal copy
+    // from VGPR to SGPR, need to change the DstRC here. This aligns with
+    // DAG-ISel.
+    unsigned AndOpc = AMDGPU::V_AND_B32_e64;
+    unsigned CmpOpc = AMDGPU::V_CMP_EQ_U32_e64;
+    Register AndReg = MRI->createVirtualRegister(SrcRC);
+
+    BuildMI(*MBB, I, DL, TII.get(AndOpc), AndReg).addImm(1).addReg(SrcReg);
+    BuildMI(*MBB, I, DL, TII.get(CmpOpc), DstReg).addReg(AndReg).addImm(1);
+
+    MRI->setRegClass(DstReg, TRI.getBoolRC());
+
+    I.eraseFromParent();
+    return true;

@chenzheng1030
Copy link
Collaborator Author

chenzheng1030 commented Apr 7, 2023

I also tried to use VReg_1 register class for the i1 input in the changed patterns (Added back the change in https://reviews.llvm.org/D119552), but get an error, saying Not a known SIRegisterClass. I am not digging into this way, I am thinking maybe VReg_1 is not right register class for the operand of V_CNDMASK_B32_e64 instruction?

I don't find any other valid register class for i1 type in the SI register info file.

@chenzheng1030
Copy link
Collaborator Author

chenzheng1030 commented Apr 7, 2023

The td patterns should be right, I see they are correctly used in DAG-ISEL in many LIT cases.

To fix this, my current plan is to avoid these several extension patterns to be selected in table-gen selection in AMDGPUInstructionSelector::select(). I am not sure if there is way to stop inside selectImpl() for cases like trying to add a VGPR to SGPR COPY

@chenzheng1030 chenzheng1030 self-assigned this Apr 7, 2023
@chenzheng1030
Copy link
Collaborator Author

I sent https://reviews.llvm.org/D147780 for this issue.

gysit pushed a commit to nextsilicon/llvm-project that referenced this issue Apr 27, 2023
However the imported rules can not be used for now because Global ISel
selectImpl() seems has some bug/limitation to create a illegl COPY
from VGPR to SGPR. So currently workaround this by not auto selecting these
patterns.

Fixes llvm#61468

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147780
flemairen6 pushed a commit to Xilinx/llvm-project that referenced this issue May 10, 2023
However the imported rules can not be used for now because Global ISel
selectImpl() seems has some bug/limitation to create a illegl COPY
from VGPR to SGPR. So currently workaround this by not auto selecting these
patterns.

Fixes llvm#61468

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147780
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants