[AMDGPU] Update base addr of dyn alloca considering GrowingUp stack #119822

easyonaadit · 2024-12-13T06:32:58Z

Currently, compiler calculates the base address of
dynamic sized stack object (alloca) as follows:

NewSP = Align(CurrSP + Size)
where Size = # of elements * wave size * alloca type
BaseAddr = NewSP
The alignment is computed as: AlignedAddr = Addr & ~(Alignment - 1)
Return the BaseAddr
This makes sense when stack is grows downwards.

AMDGPU stack grows upwards, the base address
needs to be aligned first and SP bump by required size later:

BaseAddr = Align(CurrSP)
NewSP = BaseAddr + Size
AlignedAddr = (Addr + (Alignment - 1)) & ~(Alignment - 1)
and returns the BaseAddr.

github-actions · 2024-12-13T06:33:19Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-12-13T06:33:54Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Aaditya (easyonaadit)

Changes

Currently, compiler calculates the base address of
dynamic sized stack object (alloca) by bumping SP
by required size and then aligns bumped SP:

NewSP = Align(CurrSP + Size)
where Size = # of elements * wave size * alloca type
BaseAddr = NewSP
The alignment is computed as: AlignedAddr = Addr & ~(Alignment - 1)
Return the BaseAddr, which makes sense when stack is growing downwards.

For AMDGPU which uses upward growing stack,
the base address has to be aligned first and SP
needs to be bumped later by required size:

BaseAddr = Align(CurrSP)
NewSP = BaseAddr + Size
AlignedAddr = (Addr + (Alignment - 1)) & ~(Alignment - 1)
and returns the BaseAddr.

Full diff: https://github.com/llvm/llvm-project/pull/119822.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+17-14)
(modified) llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll (+28-36)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index fc8bbb154d035d..bee66f0f7fa2db 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -4002,8 +4002,9 @@ SDValue SITargetLowering::LowerCall(CallLoweringInfo &CLI,
                          InVals, /*IsThisReturn=*/false, SDValue());
 }
 
-// This is identical to the default implementation in ExpandDYNAMIC_STACKALLOC,
-// except for applying the wave size scale to the increment amount.
+// This is similar to the default implementation in ExpandDYNAMIC_STACKALLOC,
+// except for stack growth direction(default: downwards, AMDGPU: upwards) and
+// applying the wave size scale to the increment amount.
 SDValue SITargetLowering::lowerDYNAMIC_STACKALLOCImpl(SDValue Op,
                                                       SelectionDAG &DAG) const {
   const MachineFunction &MF = DAG.getMachineFunction();
@@ -4023,10 +4024,20 @@ SDValue SITargetLowering::lowerDYNAMIC_STACKALLOCImpl(SDValue Op,
   Chain = DAG.getCALLSEQ_START(Chain, 0, 0, dl);
 
   SDValue Size = Tmp2.getOperand(1);
-  SDValue SP = DAG.getCopyFromReg(Chain, dl, SPReg, VT);
-  Chain = SP.getValue(1);
+  SDValue SPOld = DAG.getCopyFromReg(Chain, dl, SPReg, VT);
+  Chain = SPOld.getValue(1);
   MaybeAlign Alignment = cast<ConstantSDNode>(Tmp3)->getMaybeAlignValue();
   const TargetFrameLowering *TFL = Subtarget->getFrameLowering();
+  Align StackAlign = TFL->getStackAlign();
+  if (Alignment && *Alignment > StackAlign) {
+    SDValue ScaledAlignment = DAG.getSignedConstant(
+        (uint64_t)Alignment->value() << Subtarget->getWavefrontSizeLog2(), dl,
+        VT);
+    SDValue StackAlignMask = DAG.getNode(ISD::SUB, dl, VT, ScaledAlignment,
+                                         DAG.getConstant(1, dl, VT));
+    Tmp1 = DAG.getNode(ISD::ADD, dl, VT, SPOld, StackAlignMask);
+    Tmp1 = DAG.getNode(ISD::AND, dl, VT, Tmp1, ScaledAlignment);
+  }
   unsigned Opc =
       TFL->getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp
           ? ISD::ADD
@@ -4036,20 +4047,12 @@ SDValue SITargetLowering::lowerDYNAMIC_STACKALLOCImpl(SDValue Op,
       ISD::SHL, dl, VT, Size,
       DAG.getConstant(Subtarget->getWavefrontSizeLog2(), dl, MVT::i32));
 
-  Align StackAlign = TFL->getStackAlign();
-  Tmp1 = DAG.getNode(Opc, dl, VT, SP, ScaledSize); // Value
-  if (Alignment && *Alignment > StackAlign) {
-    Tmp1 = DAG.getNode(
-        ISD::AND, dl, VT, Tmp1,
-        DAG.getSignedConstant(-(uint64_t)Alignment->value()
-                                  << Subtarget->getWavefrontSizeLog2(),
-                              dl, VT));
-  }
+  Tmp1 = DAG.getNode(Opc, dl, VT, SPOld, ScaledSize); // Value
 
   Chain = DAG.getCopyToReg(Chain, dl, SPReg, Tmp1); // Output chain
   Tmp2 = DAG.getCALLSEQ_END(Chain, 0, 0, SDValue(), dl);
 
-  return DAG.getMergeValues({Tmp1, Tmp2}, dl);
+  return DAG.getMergeValues({SPOld, Tmp2}, dl);
 }
 
 SDValue SITargetLowering::LowerDYNAMIC_STACKALLOC(SDValue Op,
diff --git a/llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll b/llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll
index 85096eb63f46e1..0477d55e9baa36 100644
--- a/llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll
+++ b/llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll
@@ -30,15 +30,14 @@ define amdgpu_kernel void @kernel_non_entry_block_static_alloca_uniformly_reache
 ; MUBUF-NEXT:    s_cmp_lg_u32 s9, 0
 ; MUBUF-NEXT:    s_cbranch_scc1 .LBB0_3
 ; MUBUF-NEXT:  ; %bb.2: ; %bb.1
-; MUBUF-NEXT:    s_add_i32 s6, s32, 0x1000
-; MUBUF-NEXT:    s_lshl_b32 s7, s10, 2
-; MUBUF-NEXT:    s_mov_b32 s32, s6
+; MUBUF-NEXT:    s_mov_b32 s6, s32
 ; MUBUF-NEXT:    v_mov_b32_e32 v1, 0
-; MUBUF-NEXT:    v_mov_b32_e32 v2, s6
-; MUBUF-NEXT:    v_mov_b32_e32 v3, 1
+; MUBUF-NEXT:    v_mov_b32_e32 v2, 1
+; MUBUF-NEXT:    s_lshl_b32 s7, s10, 2
+; MUBUF-NEXT:    s_add_i32 s32, s6, 0x1000
+; MUBUF-NEXT:    buffer_store_dword v1, off, s[0:3], s6
+; MUBUF-NEXT:    buffer_store_dword v2, off, s[0:3], s6 offset:4
 ; MUBUF-NEXT:    s_add_i32 s6, s6, s7
-; MUBUF-NEXT:    buffer_store_dword v1, v2, s[0:3], 0 offen
-; MUBUF-NEXT:    buffer_store_dword v3, v2, s[0:3], 0 offen offset:4
 ; MUBUF-NEXT:    v_mov_b32_e32 v2, s6
 ; MUBUF-NEXT:    buffer_load_dword v2, v2, s[0:3], 0 offen
 ; MUBUF-NEXT:    s_load_dwordx2 s[4:5], s[4:5], 0x0
@@ -66,11 +65,11 @@ define amdgpu_kernel void @kernel_non_entry_block_static_alloca_uniformly_reache
 ; FLATSCR-NEXT:    s_cmp_lg_u32 s5, 0
 ; FLATSCR-NEXT:    s_cbranch_scc1 .LBB0_3
 ; FLATSCR-NEXT:  ; %bb.2: ; %bb.1
-; FLATSCR-NEXT:    s_add_i32 s2, s32, 0x1000
+; FLATSCR-NEXT:    s_mov_b32 s2, s32
 ; FLATSCR-NEXT:    v_mov_b32_e32 v1, 0
 ; FLATSCR-NEXT:    v_mov_b32_e32 v2, 1
 ; FLATSCR-NEXT:    s_lshl_b32 s3, s6, 2
-; FLATSCR-NEXT:    s_mov_b32 s32, s2
+; FLATSCR-NEXT:    s_add_i32 s32, s2, 0x1000
 ; FLATSCR-NEXT:    scratch_store_dwordx2 off, v[1:2], s2
 ; FLATSCR-NEXT:    s_add_i32 s2, s2, s3
 ; FLATSCR-NEXT:    scratch_load_dword v2, off, s2
@@ -131,16 +130,14 @@ define amdgpu_kernel void @kernel_non_entry_block_static_alloca_uniformly_reache
 ; MUBUF-NEXT:    s_cmp_lg_u32 s4, 0
 ; MUBUF-NEXT:    s_cbranch_scc1 .LBB1_2
 ; MUBUF-NEXT:  ; %bb.1: ; %bb.0
-; MUBUF-NEXT:    s_add_i32 s4, s32, 0x1000
-; MUBUF-NEXT:    s_and_b32 s4, s4, 0xfffff000
-; MUBUF-NEXT:    s_lshl_b32 s5, s5, 2
-; MUBUF-NEXT:    s_mov_b32 s32, s4
+; MUBUF-NEXT:    s_mov_b32 s4, s32
 ; MUBUF-NEXT:    v_mov_b32_e32 v1, 0
-; MUBUF-NEXT:    v_mov_b32_e32 v2, s4
-; MUBUF-NEXT:    v_mov_b32_e32 v3, 1
+; MUBUF-NEXT:    v_mov_b32_e32 v2, 1
+; MUBUF-NEXT:    s_lshl_b32 s5, s5, 2
+; MUBUF-NEXT:    s_add_i32 s32, s4, 0x1000
+; MUBUF-NEXT:    buffer_store_dword v1, off, s[0:3], s4
+; MUBUF-NEXT:    buffer_store_dword v2, off, s[0:3], s4 offset:4
 ; MUBUF-NEXT:    s_add_i32 s4, s4, s5
-; MUBUF-NEXT:    buffer_store_dword v1, v2, s[0:3], 0 offen
-; MUBUF-NEXT:    buffer_store_dword v3, v2, s[0:3], 0 offen offset:4
 ; MUBUF-NEXT:    v_mov_b32_e32 v2, s4
 ; MUBUF-NEXT:    buffer_load_dword v2, v2, s[0:3], 0 offen
 ; MUBUF-NEXT:    s_load_dwordx2 s[4:5], s[8:9], 0x0
@@ -165,12 +162,11 @@ define amdgpu_kernel void @kernel_non_entry_block_static_alloca_uniformly_reache
 ; FLATSCR-NEXT:    s_cmp_lg_u32 s0, 0
 ; FLATSCR-NEXT:    s_cbranch_scc1 .LBB1_2
 ; FLATSCR-NEXT:  ; %bb.1: ; %bb.0
-; FLATSCR-NEXT:    s_add_i32 s0, s32, 0x1000
 ; FLATSCR-NEXT:    v_mov_b32_e32 v1, 0
-; FLATSCR-NEXT:    s_and_b32 s0, s0, 0xfffff000
+; FLATSCR-NEXT:    s_mov_b32 s0, s32
 ; FLATSCR-NEXT:    v_mov_b32_e32 v2, 1
 ; FLATSCR-NEXT:    s_lshl_b32 s1, s1, 2
-; FLATSCR-NEXT:    s_mov_b32 s32, s0
+; FLATSCR-NEXT:    s_add_i32 s32, s0, 0x1000
 ; FLATSCR-NEXT:    scratch_store_dwordx2 off, v[1:2], s0
 ; FLATSCR-NEXT:    s_add_i32 s0, s0, s1
 ; FLATSCR-NEXT:    scratch_load_dword v2, off, s0
@@ -230,16 +226,15 @@ define void @func_non_entry_block_static_alloca_align4(ptr addrspace(1) %out, i3
 ; MUBUF-NEXT:    s_and_b64 exec, exec, vcc
 ; MUBUF-NEXT:    s_cbranch_execz .LBB2_3
 ; MUBUF-NEXT:  ; %bb.2: ; %bb.1
-; MUBUF-NEXT:    s_add_i32 s6, s32, 0x1000
+; MUBUF-NEXT:    s_mov_b32 s6, s32
 ; MUBUF-NEXT:    v_mov_b32_e32 v2, 0
-; MUBUF-NEXT:    v_mov_b32_e32 v3, s6
-; MUBUF-NEXT:    buffer_store_dword v2, v3, s[0:3], 0 offen
+; MUBUF-NEXT:    buffer_store_dword v2, off, s[0:3], s6
 ; MUBUF-NEXT:    v_mov_b32_e32 v2, 1
-; MUBUF-NEXT:    buffer_store_dword v2, v3, s[0:3], 0 offen offset:4
+; MUBUF-NEXT:    buffer_store_dword v2, off, s[0:3], s6 offset:4
 ; MUBUF-NEXT:    v_lshl_add_u32 v2, v4, 2, s6
 ; MUBUF-NEXT:    buffer_load_dword v2, v2, s[0:3], 0 offen
 ; MUBUF-NEXT:    v_and_b32_e32 v3, 0x3ff, v31
-; MUBUF-NEXT:    s_mov_b32 s32, s6
+; MUBUF-NEXT:    s_add_i32 s32, s6, 0x1000
 ; MUBUF-NEXT:    s_waitcnt vmcnt(0)
 ; MUBUF-NEXT:    v_add_u32_e32 v2, v2, v3
 ; MUBUF-NEXT:    global_store_dword v[0:1], v2, off
@@ -266,14 +261,14 @@ define void @func_non_entry_block_static_alloca_align4(ptr addrspace(1) %out, i3
 ; FLATSCR-NEXT:    s_and_b64 exec, exec, vcc
 ; FLATSCR-NEXT:    s_cbranch_execz .LBB2_3
 ; FLATSCR-NEXT:  ; %bb.2: ; %bb.1
-; FLATSCR-NEXT:    s_add_i32 s2, s32, 0x1000
+; FLATSCR-NEXT:    s_mov_b32 s2, s32
 ; FLATSCR-NEXT:    v_mov_b32_e32 v2, 0
 ; FLATSCR-NEXT:    v_mov_b32_e32 v3, 1
 ; FLATSCR-NEXT:    scratch_store_dwordx2 off, v[2:3], s2
 ; FLATSCR-NEXT:    v_lshl_add_u32 v2, v4, 2, s2
 ; FLATSCR-NEXT:    scratch_load_dword v2, v2, off
 ; FLATSCR-NEXT:    v_and_b32_e32 v3, 0x3ff, v31
-; FLATSCR-NEXT:    s_mov_b32 s32, s2
+; FLATSCR-NEXT:    s_add_i32 s32, s2, 0x1000
 ; FLATSCR-NEXT:    s_waitcnt vmcnt(0)
 ; FLATSCR-NEXT:    v_add_u32_e32 v2, v2, v3
 ; FLATSCR-NEXT:    global_store_dword v[0:1], v2, off
@@ -324,17 +319,15 @@ define void @func_non_entry_block_static_alloca_align64(ptr addrspace(1) %out, i
 ; MUBUF-NEXT:    s_and_saveexec_b64 s[4:5], vcc
 ; MUBUF-NEXT:    s_cbranch_execz .LBB3_2
 ; MUBUF-NEXT:  ; %bb.1: ; %bb.0
-; MUBUF-NEXT:    s_add_i32 s6, s32, 0x1000
-; MUBUF-NEXT:    s_and_b32 s6, s6, 0xfffff000
+; MUBUF-NEXT:    s_mov_b32 s6, s32
 ; MUBUF-NEXT:    v_mov_b32_e32 v2, 0
-; MUBUF-NEXT:    v_mov_b32_e32 v4, s6
-; MUBUF-NEXT:    buffer_store_dword v2, v4, s[0:3], 0 offen
+; MUBUF-NEXT:    buffer_store_dword v2, off, s[0:3], s6
 ; MUBUF-NEXT:    v_mov_b32_e32 v2, 1
-; MUBUF-NEXT:    buffer_store_dword v2, v4, s[0:3], 0 offen offset:4
+; MUBUF-NEXT:    buffer_store_dword v2, off, s[0:3], s6 offset:4
 ; MUBUF-NEXT:    v_lshl_add_u32 v2, v3, 2, s6
 ; MUBUF-NEXT:    buffer_load_dword v2, v2, s[0:3], 0 offen
 ; MUBUF-NEXT:    v_and_b32_e32 v3, 0x3ff, v31
-; MUBUF-NEXT:    s_mov_b32 s32, s6
+; MUBUF-NEXT:    s_add_i32 s32, s6, 0x1000
 ; MUBUF-NEXT:    s_waitcnt vmcnt(0)
 ; MUBUF-NEXT:    v_add_u32_e32 v2, v2, v3
 ; MUBUF-NEXT:    global_store_dword v[0:1], v2, off
@@ -358,15 +351,14 @@ define void @func_non_entry_block_static_alloca_align64(ptr addrspace(1) %out, i
 ; FLATSCR-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; FLATSCR-NEXT:    s_cbranch_execz .LBB3_2
 ; FLATSCR-NEXT:  ; %bb.1: ; %bb.0
-; FLATSCR-NEXT:    s_add_i32 s2, s32, 0x1000
-; FLATSCR-NEXT:    s_and_b32 s2, s2, 0xfffff000
+; FLATSCR-NEXT:    s_mov_b32 s2, s32
 ; FLATSCR-NEXT:    v_mov_b32_e32 v4, 0
 ; FLATSCR-NEXT:    v_mov_b32_e32 v5, 1
 ; FLATSCR-NEXT:    scratch_store_dwordx2 off, v[4:5], s2
 ; FLATSCR-NEXT:    v_lshl_add_u32 v2, v3, 2, s2
 ; FLATSCR-NEXT:    scratch_load_dword v2, v2, off
 ; FLATSCR-NEXT:    v_and_b32_e32 v3, 0x3ff, v31
-; FLATSCR-NEXT:    s_mov_b32 s32, s2
+; FLATSCR-NEXT:    s_add_i32 s32, s2, 0x1000
 ; FLATSCR-NEXT:    s_waitcnt vmcnt(0)
 ; FLATSCR-NEXT:    v_add_u32_e32 v2, v2, v3
 ; FLATSCR-NEXT:    global_store_dword v[0:1], v2, off

arsenm

Are there new unit tests for dynamic alloca somewhere? We really need those. When this code was added, it was impossible to test since clang forbade C99 variable arrays, and __builtin_alloca was broken in all the GPU languages. That should work now, so we can write tests for it.

Does this also mean the default handling path is broken for the upward stack growth direction? Should that be fixed or turned into an assert?

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

cdevadas · 2024-12-13T07:36:34Z

AMDGPU stack grows upwards, the base address
needs to be aligned first and SP bump by required size later:

The address computation should be retained for both directions. The Opc selection currently happens based on the Stack growth direction (as per the code below). The code would otherwise break suddenly if we decide to change the stack growth direction for any future architecture.
TFL->getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp

arsenm · 2024-12-13T07:38:09Z

AMDGPU stack grows upwards, the base address
needs to be aligned first and SP bump by required size later:

The address computation should be retained for both directions. The Opc selection currently happens based on the Stack growth direction (as per the code below). The code would otherwise break suddenly if we decide to change the stack growth direction for any future architecture. TFL->getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp

We won't decide to change the stack growth direction. We probably shouldn't be trying to handle grow down in any way

cdevadas · 2024-12-13T07:41:37Z

AMDGPU stack grows upwards, the base address
needs to be aligned first and SP bump by required size later:

The address computation should be retained for both directions. The Opc selection currently happens based on the Stack growth direction (as per the code below). The code would otherwise break suddenly if we decide to change the stack growth direction for any future architecture. TFL->getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp

We won't decide to change the stack growth direction. We probably shouldn't be trying to handle grow down in any way

If that's the case why do we need this while selecting the opcode
https://github.com/llvm/llvm-project/pull/119822/files#diff-e78d2fbd64648d787707fd3d4e7e5b5d2f00fb9c09972937718a11237933c597R4042

arsenm · 2024-12-13T07:43:40Z

If that's the case why do we need this while selecting the opcode https://github.com/llvm/llvm-project/pull/119822/files#diff-e78d2fbd64648d787707fd3d4e7e5b5d2f00fb9c09972937718a11237933c597R4042

To make things more complicated than necessary. This should just assert

cdevadas · 2024-12-13T07:48:21Z

If that's the case why do we need this while selecting the opcode https://github.com/llvm/llvm-project/pull/119822/files#diff-e78d2fbd64648d787707fd3d4e7e5b5d2f00fb9c09972937718a11237933c597R4042

To make things more complicated than necessary. This should just assert

@easyonaadit fix it in a separate patch.

easyonaadit · 2024-12-13T11:16:23Z

If that's the case why do we need this while selecting the opcode https://github.com/llvm/llvm-project/pull/119822/files#diff-e78d2fbd64648d787707fd3d4e7e5b5d2f00fb9c09972937718a11237933c597R4042

To make things more complicated than necessary. This should just assert

@easyonaadit fix it in a separate patch.

yupp I will do that then.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

…20063) For #119822

…stack

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

github-actions · 2024-12-19T06:39:32Z

✅ With the latest revision this PR passed the C/C++ code formatter.

easyonaadit · 2024-12-19T08:52:15Z

@s-barannikov hi, do you feel some more changes are required or may I go ahead and merge this?

easyonaadit · 2024-12-20T04:55:35Z

@s-barannikov hi, do you feel some more changes are required or may I go ahead and merge this?

I am merging it for now, let me know if any additional changes are required.

github-actions · 2024-12-20T04:57:46Z

@easyonaadit Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

…vm#120063) For llvm#119822

…lvm#119822) Currently, compiler calculates the base address of dynamic sized stack object (alloca) as follows: 1. `NewSP = Align(CurrSP + Size)` _where_ `Size = # of elements * wave size * alloca type` 2. `BaseAddr = NewSP` 3. The alignment is computed as: `AlignedAddr = Addr & ~(Alignment - 1)` 4. Return the `BaseAddr` This makes sense when stack is grows downwards. AMDGPU stack grows upwards, the base address needs to be aligned first and SP bump by required size later: 1. `BaseAddr = Align(CurrSP)` 2. `NewSP = BaseAddr + Size` 3. `AlignedAddr = (Addr + (Alignment - 1)) & ~(Alignment - 1)` 4. and returns the `BaseAddr`.

llvmbot added the backend:AMDGPU label Dec 13, 2024

pravinjagtap requested review from arsenm, jayfoad, cdevadas, shiltian and pravinjagtap December 13, 2024 06:40

arsenm reviewed Dec 13, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll Show resolved Hide resolved

arsenm reviewed Dec 13, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Show resolved Hide resolved

This was referenced Dec 16, 2024

[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas #120063

Merged

[AMDGPU] Assert if stack grows downwards. #119888

Merged

s-barannikov reviewed Dec 16, 2024

View reviewed changes

pravinjagtap pushed a commit that referenced this pull request Dec 18, 2024

[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas (#1…

99c2e3b

…20063) For #119822

easyonaadit added 2 commits December 18, 2024 18:44

fixing base address of dynamically sized stack object for growing up …

2d06d3a

…stack

GlobalIsel code change, review comments

a7ca93a

easyonaadit force-pushed the refactored-code-dynamic-alloca branch from 29fa6ce to a7ca93a Compare December 18, 2024 15:41

llvmbot added the llvm:globalisel label Dec 18, 2024

s-barannikov reviewed Dec 18, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Outdated Show resolved Hide resolved

s-barannikov reviewed Dec 18, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SIISelLowering.cpp Outdated Show resolved Hide resolved

Review comments pt.2

55f293d

easyonaadit force-pushed the refactored-code-dynamic-alloca branch from 8d90fcf to 55f293d Compare December 19, 2024 06:40

easyonaadit requested a review from s-barannikov December 19, 2024 06:59

easyonaadit requested a review from arsenm December 19, 2024 06:59

arsenm approved these changes Dec 19, 2024

View reviewed changes

lalaniket8 merged commit c760671 into llvm:main Dec 20, 2024
8 checks passed

easyonaadit mentioned this pull request Dec 26, 2024

Request Commit Access For easyonaadit #121143

Closed

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 24, 2025

[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas (ll…

de984a5

…vm#120063) For llvm#119822

[AMDGPU] Update base addr of dyn alloca considering GrowingUp stack #119822

[AMDGPU] Update base addr of dyn alloca considering GrowingUp stack #119822

Uh oh!

Conversation

easyonaadit commented Dec 13, 2024 • edited by pravinjagtap Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 13, 2024

Uh oh!

llvmbot commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cdevadas commented Dec 13, 2024

Uh oh!

arsenm commented Dec 13, 2024

Uh oh!

cdevadas commented Dec 13, 2024

Uh oh!

arsenm commented Dec 13, 2024

Uh oh!

cdevadas commented Dec 13, 2024

Uh oh!

easyonaadit commented Dec 13, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

easyonaadit commented Dec 19, 2024

Uh oh!

easyonaadit commented Dec 20, 2024

Uh oh!

Uh oh!

github-actions bot commented Dec 20, 2024

Uh oh!

Uh oh!

easyonaadit commented Dec 13, 2024 •

edited by pravinjagtap

Loading

llvmbot commented Dec 13, 2024 •

edited

Loading

github-actions bot commented Dec 19, 2024 •

edited

Loading