Skip to content

[WIP][AMDGPU] Change CC_AMDGPU_Func to only use SGPR0 to SGPR27 for inreg argument passing #115753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

shiltian
Copy link
Contributor

@shiltian shiltian commented Nov 11, 2024

In emitCSRSpillStores, a caller-saved SGPR is required to save exec, which
limits us to using SGPR0 through SGPR29. Currently, we assume that one is always
available; however, this isn’t always the case, as SGPR0 to SGPR29 are also used
for inreg argument passing.

This PR is trying to fix this issue by not using all caller-saved SGPRs for
inreg argument passing. This will make sure that we will always have at least
two SGPRs available when it needs CSR spilling.

Fixes #113782.

… `inreg` argument passing

In `emitCSRSpillStores`, a caller-saved SGPR is required to save `exec`, which
limits us to using SGPR0 through SGPR29
Currently, we assume that one is always available; however, this isn’t always
the case, as SGPR0 to SGPR29 are also used for inreg argument passing.

This PR is trying to fix this issue by not using all caller-saved SGPRs for
`inreg` argument passing. This will make sure that we will always have at least
two SGPRs available when it needs CSR spilling.

Fixes #113782.
Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @shiltian and the rest of your teammates on Graphite Graphite

@llvmbot
Copy link
Member

llvmbot commented Nov 11, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

In emitCSRSpillStores, a caller-saved SGPR is required to save exec, which
limits us to using SGPR0 through SGPR29
Currently, we assume that one is always available; however, this isn’t always
the case, as SGPR0 to SGPR29 are also used for inreg argument passing.

This PR is trying to fix this issue by not using all caller-saved SGPRs for
inreg argument passing. This will make sure that we will always have at least
two SGPRs available when it needs CSR spilling.

Fixes #113782.


Full diff: https://github.com/llvm/llvm-project/pull/115753.diff

2 Files Affected:

  • (modified) llvm/docs/AMDGPUUsage.rst (+2-2)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td (+1-1)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 5b83ea428c0bff..b17226ccde712b 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -16545,7 +16545,7 @@ On entry to a function:
     :ref:`amdgpu-amdhsa-kernel-prolog-m0`.
 4.  The EXEC register is set to the lanes active on entry to the function.
 5.  MODE register: *TBD*
-6.  VGPR0-31 and SGPR4-29 are used to pass function input arguments as described
+6.  VGPR0-31 and SGPR4-27 are used to pass function input arguments as described
     below.
 7.  SGPR30-31 return address (RA). The code address that the function must
     return to when it completes. The value is undefined if the function is *no
@@ -16796,7 +16796,7 @@ The input and result arguments are assigned in order in the following manner:
     How are overly aligned structures allocated on the stack?
 
 * SGPR arguments are assigned to consecutive SGPRs starting at SGPR0 up to
-  SGPR29.
+  SGPR27.
 
   If there are more arguments than will fit in these registers, the remaining
   arguments are allocated on the stack in order on naturally aligned
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
index 80969fce3d77fb..1787fe76a31ea5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
@@ -127,7 +127,7 @@ def CC_AMDGPU_Func : CallingConv<[
   CCIfType<[i8, i16], CCIfExtend<CCPromoteToType<i32>>>,
 
   CCIfInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16, bf16, v2bf16] , CCAssignToReg<
-    !foreach(i, !range(0, 30), !cast<Register>("SGPR"#i))  // SGPR0-29
+    !foreach(i, !range(0, 28), !cast<Register>("SGPR"#i))  // SGPR0-27
   >>>,
 
   CCIfType<[i32, f32, i16, f16, v2i16, v2f16, i1, bf16, v2bf16], CCAssignToReg<

@shiltian
Copy link
Contributor Author

shiltian commented Nov 11, 2024

This PR is still a work in progress, as it encounters other invalid SGPR-to-VGPR copies. However, I’d like to gather early feedback on whether this is the right approach before investing significant effort into it.

@shiltian shiltian closed this Dec 13, 2024
@shiltian shiltian deleted the users/shiltian/keep-two-sgprs-for-csr-spill branch December 13, 2024 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AMDGPU] No available SGPR for CSR spill stores
2 participants