-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AMDGPU][GlobalISel] illegal VGPR to SGPR copy #61468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-amdgpu |
FYI @arsenm @Pierre-vh |
This is a situation where the register class cannot be inferred from the type and is ambiguous. This is a bad pattern change, which ideally tablegen would error on. This really needs to use an explicit register class. Bool values in particular are tricky |
Thanks. What register class should be set here for the diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index c10bbe7367a1..af6316a95629 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -2011,13 +2011,13 @@ def : GCNPat <
def : GCNPat <
(i32 (sext i1:$src0)),
(V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
- /*src1mod*/(i32 0), /*src1*/(i32 -1), $src0)
+ /*src1mod*/(i32 0), /*src1*/(i32 -1), SSrc_i1:$src0)
>;
class Ext32Pat <SDNode ext> : GCNPat <
(i32 (ext i1:$src0)),
(V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
- /*src1mod*/(i32 0), /*src1*/(i32 1), $src0)
+ /*src1mod*/(i32 0), /*src1*/(i32 1), SSrc_i1:$src0)
>;
def : Ext32Pat <zext>; Illegal copy is still generated from below pattern:
------>
|
It depends on the wave size. It's either SReg_32_XEXEC or SReg_64_XEXEC based on the wavesize, so we may need to duplicate the patterns. SSrc_i1 is SReg_1_XEXEC, which is the union of the two and unallocatable |
Thank you @arsenm , I did the following change, but seems the above simple case still gets illegal copy result. diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index c10bbe7367a1..39e6846dbd05 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -2008,20 +2008,39 @@ def : GCNPat <
(V_EXP_F32_e64 SRCMODS.NONE, (V_MUL_LEGACY_F32_e64 $src1_mods, $src1, SRCMODS.NONE, (V_LOG_F32_e64 $src0_mods, $src0), 0, 0))
>;
+let WaveSizePredicate = isWave32 in {
+def : GCNPat <
+ (i32 (sext i1:$src0)),
+ (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
+ /*src1mod*/(i32 0), /*src1*/(i32 -1), SReg_32_XEXEC:$src0)
+>;
+
+class Ext32Pat_B32 <SDNode ext> : GCNPat <
+ (i32 (ext i1:$src0)),
+ (V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
+ /*src1mod*/(i32 0), /*src1*/(i32 1), SReg_32_XEXEC:$src0)
+>;
+
+def : Ext32Pat_B32 <zext>;
+def : Ext32Pat_B32 <anyext>;
+}
+
+let WaveSizePredicate = isWave64 in {
def : GCNPat <
(i32 (sext i1:$src0)),
(V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
- /*src1mod*/(i32 0), /*src1*/(i32 -1), $src0)
+ /*src1mod*/(i32 0), /*src1*/(i32 -1), SReg_64_XEXEC:$src0)
>;
-class Ext32Pat <SDNode ext> : GCNPat <
+class Ext32Pat_B64 <SDNode ext> : GCNPat <
(i32 (ext i1:$src0)),
(V_CNDMASK_B32_e64 /*src0mod*/(i32 0), /*src0*/(i32 0),
- /*src1mod*/(i32 0), /*src1*/(i32 1), $src0)
+ /*src1mod*/(i32 0), /*src1*/(i32 1), SReg_64_XEXEC:$src0)
>;
-def : Ext32Pat <zext>;
-def : Ext32Pat <anyext>;
+def : Ext32Pat_B64 <zext>;
+def : Ext32Pat_B64 <anyext>;
+}
// The multiplication scales from [0,1) to the unsigned integer range,
// rounding down a bit to avoid unwanted overflow. Even with the RC, it is still a illegal copy.
Seems for the case, RC |
My current guess is there is a copy from VGPR to VCC that is just being incorrectly selected. I think this pattern should be right. Was the copy inserted during selection or already present? |
The COPY is inserted in global-isel instruction selection phase.
After regbank selection:
After instruction selection:
|
RegBankSelect should not have produced %58:vgpr(s1) = G_TRUNC %68:vgpr(s16). We do not want to see s1 VGPR assignments |
A reduced case:
Before Instruction Selection:
During Instruction Selection:
So after Instruction Selection:
So this issue only happens when we change the td pattern. And the VGPR to SGPR COPY is generated in the instruction selection phase by the Table-gen selection framework. |
DAG-ISEL selects
DAG pattern:
So I tried to select the
I think this is expected, because in my change, I force the VGPR to SGPR for our motivated case, but for other cases which requires VGPR(s1), after the change, the RC will not be right. So this method must be wrong. Code change I made in
|
I also tried to use I don't find any other valid register class for i1 type in the SI register info file. |
The td patterns should be right, I see they are correctly used in DAG-ISEL in many LIT cases. To fix this, my current plan is to avoid these several extension patterns to be selected in table-gen selection in |
I sent https://reviews.llvm.org/D147780 for this issue. |
However the imported rules can not be used for now because Global ISel selectImpl() seems has some bug/limitation to create a illegl COPY from VGPR to SGPR. So currently workaround this by not auto selecting these patterns. Fixes llvm#61468 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147780
However the imported rules can not be used for now because Global ISel selectImpl() seems has some bug/limitation to create a illegl COPY from VGPR to SGPR. So currently workaround this by not auto selecting these patterns. Fixes llvm#61468 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147780
Still from https://reviews.llvm.org/D141247, some unexpected changes:
td changes:
Get some lit failures and some of them are not just code gen difference, for example for below case
s_ssubsat_i128
in fileCodeGen/AMDGPU/GlobalISel/ssubsat.ll
,The text was updated successfully, but these errors were encountered: