-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AMDGPU] New intrinsic llvm.amdgcn.pops.exiting.wave.id #89612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-llvm-ir Author: Jay Foad (jayfoad) ChangesThis provides access to the special scalar source value Full diff: https://github.com/llvm/llvm-project/pull/89612.diff 4 Files Affected:
diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index ee9a5d7a343980..1ec48650eba7a9 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -2485,6 +2485,9 @@ class AMDGPUGlobalLoadLDS : Intrinsic <
"", [SDNPMemOperand]>;
def int_amdgcn_global_load_lds : AMDGPUGlobalLoadLDS;
+def int_amdgcn_pops_exiting_wave_id :
+ DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
+
//===----------------------------------------------------------------------===//
// GFX10 Intrinsics
//===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index aa4ec785bf02a3..6f4906a696d3af 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -4825,11 +4825,15 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
OpdsMapping[2] = AMDGPU::getValueMapping(regBankID, OpSize);
break;
}
- case Intrinsic::amdgcn_s_bitreplicate:
+ case Intrinsic::amdgcn_s_bitreplicate: {
Register MaskReg = MI.getOperand(2).getReg();
unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, 32);
+ break;
+ }
+ case Intrinsic::amdgcn_pops_exiting_wave_id:
+ return getDefaultMappingSOP(MI);
}
break;
}
diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td b/llvm/lib/Target/AMDGPU/SOPInstructions.td
index 0b7d45ee8c027d..4525c2345d9e45 100644
--- a/llvm/lib/Target/AMDGPU/SOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td
@@ -1865,6 +1865,12 @@ let SubtargetPredicate = isNotGFX9Plus in {
def : GetFPModePat<fpmode_mask_gfx6plus>;
}
+let SubtargetPredicate = isGFX9GFX10 in
+def : GCNPat<
+ (int_amdgcn_pops_exiting_wave_id),
+ (S_MOV_B32 (i32 SRC_POPS_EXITING_WAVE_ID))
+>;
+
//===----------------------------------------------------------------------===//
// SOP2 Patterns
//===----------------------------------------------------------------------===//
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.pops.exiting.wave.id.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.pops.exiting.wave.id.ll
new file mode 100644
index 00000000000000..4927c2ffcdf30d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.pops.exiting.wave.id.ll
@@ -0,0 +1,34 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx900 < %s | FileCheck %s -check-prefix=SDAG
+; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx900 < %s | FileCheck %s -check-prefix=GFX9-GISEL
+; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck %s -check-prefix=SDAG
+; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck %s -check-prefix=GFX10-GISEL
+
+define amdgpu_ps void @test(ptr addrspace(1) inreg %ptr) {
+; SDAG-LABEL: test:
+; SDAG: ; %bb.0:
+; SDAG-NEXT: s_mov_b32 s2, src_pops_exiting_wave_id
+; SDAG-NEXT: v_mov_b32_e32 v0, 0
+; SDAG-NEXT: v_mov_b32_e32 v1, s2
+; SDAG-NEXT: global_store_dword v0, v1, s[0:1]
+; SDAG-NEXT: s_endpgm
+;
+; GFX9-GISEL-LABEL: test:
+; GFX9-GISEL: ; %bb.0:
+; GFX9-GISEL-NEXT: s_mov_b32 s2, src_pops_exiting_wave_id
+; GFX9-GISEL-NEXT: v_mov_b32_e32 v0, s2
+; GFX9-GISEL-NEXT: v_mov_b32_e32 v1, 0
+; GFX9-GISEL-NEXT: global_store_dword v1, v0, s[0:1]
+; GFX9-GISEL-NEXT: s_endpgm
+;
+; GFX10-GISEL-LABEL: test:
+; GFX10-GISEL: ; %bb.0:
+; GFX10-GISEL-NEXT: s_mov_b32 s2, src_pops_exiting_wave_id
+; GFX10-GISEL-NEXT: v_mov_b32_e32 v1, 0
+; GFX10-GISEL-NEXT: v_mov_b32_e32 v0, s2
+; GFX10-GISEL-NEXT: global_store_dword v1, v0, s[0:1]
+; GFX10-GISEL-NEXT: s_endpgm
+ %id = call i32 @llvm.amdgcn.pops.exiting.wave.id()
+ store i32 %id, ptr addrspace(1) %ptr
+ ret void
+}
|
define amdgpu_ps void @test(ptr addrspace(1) inreg %ptr) { | ||
; SDAG-LABEL: test: | ||
; SDAG: ; %bb.0: | ||
; SDAG-NEXT: s_mov_b32 s2, src_pops_exiting_wave_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO in a future patch: fold src_pops_exiting_wave_id into uses of s2. This might be tricky because the value of src_pops_exiting_wave_id is volatile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. As long as there's only a single use it may be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possible, but disgustingly dirty approach would be to expose it not as a load, but rather directly as an integer addition to an operand 🙃 But that'd definitely add some excessive responsibilities to it, especially since there are other ways the wave ID wraparound can be handled mathematically such as comparisons, and I don't know how unfriendly to the rest of the architecture that would be.
@@ -2485,6 +2485,9 @@ class AMDGPUGlobalLoadLDS : Intrinsic < | |||
"", [SDNPMemOperand]>; | |||
def int_amdgcn_global_load_lds : AMDGPUGlobalLoadLDS; | |||
|
|||
def int_amdgcn_pops_exiting_wave_id : | |||
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this reads a volatile value it might need some more attributes - maybe IntrHasSideEffects
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember any of the POPS stuff, but perhaps just don't make it IntrNoMem
and IntrSpeculatable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep the IntrNoMem
because I'd like to get to a point where all intrinsics that touch memory also have memoperands in MIR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntrInaccessibleMemOnly? We probably should have a PseudoSourceValue for all of those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with SideEffects since it seems like the most direct and obious ways to prevent CSE-like optimizations. I don't see any added benefit in creating a PseudoSourceValue for this.
let SubtargetPredicate = isGFX9GFX10 in | ||
def : GCNPat< | ||
(int_amdgcn_pops_exiting_wave_id), | ||
(S_MOV_B32_sideeffects (i32 SRC_POPS_EXITING_WAVE_ID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it really need side effects? If it's directly reading a reserved register you can't do much of anything with it anyway. Reserved registers are approximately volatile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs side effects to prevent tablegen from complaining about selecting a side-effecting intrinsic to a non-side-effecting instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InaccessibleMemOnly should avoid that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you're suggesting I should use InaccessibleMemOnly and give it a memoperand with a new PseudoSourceValue? I don't understand why that is preferable overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It doesn't need a memory operand if it's a register dependency. You only need one for mayLoad=1 or mayStore=1 instructions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't CSE'd with a possibly writing call in between. Is this supposed to act more like readcyclecounter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to act more like readcyclecounter?
Yes. It's a volatile value that can change at any time, even without an explicit write in the IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you just have to copy the attributes from readyclecounter, which is just DefaultAttrsIntrinsic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's kind of what I've done already! readcyclecounter reads and writes mem. The new instrinsic also reads and writes mem. The only difference is that I have restricted it to inaccessible mem only, which I still think is reasonable (and could perhaps be applied to readcyclecounter too).
The current patch works for me. Are you asking for / suggesting any changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then what would stop the MachineInstr getting CSEd, hoisted out of loops, rematerialized etc? We definitely need to prevent it from being moved past control flow so that you can write loops that poll the value.
The use of reserved physical register
That does not seem to work. See:
; SDAG-NEXT: s_mov_b32 s0, src_pops_exiting_wave_id |
%1:sreg_32 = S_MOV_B32 $src_pops_exiting_wave_id
is hoisted by early-machinelicm.
3a2f03a
to
53979d1
Compare
// Variant of S_MOV_B32 used for reading volatile source values like | ||
// SRC_POPS_EXITING_WAVE_ID. | ||
let mayLoad = 1, mayStore = 1, maybeAtomic = 0 in | ||
def S_MOV_B32_loadstore : SOP1_32 <"s_mov_b32">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not yet managed to prevent TableGen from complaining about selecting an IntrInaccessibleMemOnly intrinsic to an instruction without mayLoad/mayStore. The flag inference stuff in CodeGenDAGPatterns::InferFromPattern is just too complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just delete it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather avoid the pseudo. I'd probably prefer to just manually select over having it. But really that flag inference stuff in TableGen is garbage. We at minimum need to fix the properties to exactly match the attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I updated the patch to select it in C++ (and rebased).
This provides access to the special scalar source value SRC_POPS_EXITING_WAVE_ID on GFX9 and GFX10.
53979d1
to
2dc23c1
Compare
@jayfoad This is failing on EXPENSIVE_CHECKS builds: https://lab.llvm.org/buildbot/#/builders/16/builds/65957 |
Thanks. It's a silly mistake. I will fix in a minute... |
Fixed by 4e0f8a4 |
This provides access to the special scalar source value
SRC_POPS_EXITING_WAVE_ID on GFX9 and GFX10.