-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AMDGPU] Rework architected SGPRs implementation #79001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Rework the architected SGPRs implementation such that workgroup id values (which live in TTMP registers) are available in all functions and do not rely on calling allocateSystemSGPRs to set them up.
if (!AMDGPU::isGraphics(CC) || | ||
(CC == CallingConv::AMDGPU_CS && ST.hasArchitectedSGPRs())) { | ||
if (!AMDGPU::isGraphics(CC) || CC == CallingConv::AMDGPU_CS || | ||
ST.hasArchitectedSGPRs()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks redundant as this patch always allocate the TTMP* for subtargets with architectedSGPR enabled.
This was initially added with 2171f04.
You can revert this check to just have only !AMDGPU::isGraphics(CC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I need WorkGroupIDZ
to be set correctly when architected SGPRs are enabled. It is used below, line 177.
; GCN: s_mov_b64 s[4:5], s[0:1] | ||
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[4:7], 0 offset:4 | ||
; GCN: s_mov_b64 s[8:9], s[0:1] | ||
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[8:11], 0 offset:4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I guess I've broken something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following change? It now enables workgroup IDs for AMDGPU_CS always.
SIMachineFunctionInfo.cpp:110
- if (!AMDGPU::isGraphics(CC) ||
- (CC == CallingConv::AMDGPU_CS && ST.hasArchitectedSGPRs())) {
- if (!AMDGPU::isGraphics(CC) || CC == CallingConv::AMDGPU_CS ||
- ST.hasArchitectedSGPRs()) {
The WaveID support is missing. Hope that will be coming in a separate patch. |
@@ -143,6 +143,11 @@ struct AMDGPUFunctionArgInfo { | |||
ArgDescriptor WorkGroupInfo; | |||
ArgDescriptor PrivateSegmentWaveByteOffset; | |||
|
|||
// System TTMPs. | |||
ArgDescriptor ArchitectedWorkGroupIDX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these really need to be tracked in ArgumentUsageInfo; they aren't arguments anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I started again from scratch: #79120
@@ -169,6 +169,15 @@ SIMachineFunctionInfo::SIMachineFunctionInfo(const Function &F, | |||
VGPRForAGPRCopy = | |||
AMDGPU::VGPR_32RegClass.getRegister(ST.getMaxNumVGPRs(F) - 1); | |||
} | |||
|
|||
if (STI->hasArchitectedSGPRs()) { | |||
ArgInfo.ArchitectedWorkGroupIDX = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK for the lowering to directly consume the hardcoded register numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean not go via these ArgDescriptor
s? I'm only using them because they handle the shifting and masking.
Rework the architected SGPRs implementation such that workgroup id
values (which live in TTMP registers) are available in all functions and
do not rely on calling allocateSystemSGPRs to set them up.