-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[RISC-V] Only emit multiples of 16 as immediate for cm.push #84935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
An immediate that isn't a multiple of 16 is meaningless and we should always emit the next higher multiple. Otherwise the value is simply truncated of course which leads to clobbering CSR's on the stack.
@llvm/pr-subscribers-backend-risc-v Author: Nemanja Ivanovic (nemanjai) ChangesAn immediate that isn't a multiple of 16 is meaningless and we should always emit the next higher multiple. Otherwise the value is simply truncated of course which leads to clobbering CSR's on the stack. Full diff: https://github.com/llvm/llvm-project/pull/84935.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp b/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
index 8bac41372b5a83..4f3d915818d1ae 100644
--- a/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
@@ -555,7 +555,7 @@ void RISCVFrameLowering::emitPrologue(MachineFunction &MF,
FirstFrameSetup->getOpcode() == RISCV::CM_PUSH) {
// Use available stack adjustment in push instruction to allocate additional
// stack space.
- uint64_t Spimm = std::min(StackSize, (uint64_t)48);
+ uint64_t Spimm = alignTo(std::min(StackSize, (uint64_t)48), 16);
FirstFrameSetup->getOperand(1).setImm(Spimm);
StackSize -= Spimm;
}
diff --git a/llvm/test/CodeGen/RISCV/zcmp-additional-stack.ll b/llvm/test/CodeGen/RISCV/zcmp-additional-stack.ll
index e5c2e0180ee0a6..738d1b07e2210c 100644
--- a/llvm/test/CodeGen/RISCV/zcmp-additional-stack.ll
+++ b/llvm/test/CodeGen/RISCV/zcmp-additional-stack.ll
@@ -3,7 +3,8 @@
define ptr @func(ptr %s, i32 %_c, ptr %incdec.ptr, i1 %0, i8 %conv14) #0 {
; RV32-LABEL: func:
; RV32: # %bb.0: # %entry
-; RV32-NEXT: cm.push {ra, s0-s1}, -24
+; RV32-NEXT: cm.push {ra, s0-s1}, -32
+; RV32-NEXT: addi sp, sp, 8
; RV32-NEXT: .cfi_def_cfa_offset 24
; RV32-NEXT: .cfi_offset ra, -12
; RV32-NEXT: .cfi_offset s0, -8
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CM_PUSH
LGTM. Do we need to do the same for CM_POP
in emitEpilogue
?
FirstFrameSetup->getOperand(1).setImm(Spimm); | ||
StackSize -= Spimm; | ||
StackSize -= std::min(StackSize, Spimm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we can move the sp by more than StackSize? Doesn't that make the position of sp out of sync with the rest of the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if the amount of space in the offset for the cm.push
is more than the additional stack space required, then yes we could. I don't follow why this is a problem though. Or maybe I am misunderstanding what the StackSize
variable actually represents - the way I have interpreted it is "The amount of stack space required for parameters and local objects (not including the space to spill CSR's)". Of course, if my interpretation is correct, I'd certainly make the argument that the variable is poorly named, but hey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move the sp by more than StackSize here, and if it does, the stack pointer will be readjusted in the following if statement where StackSize != 0
.
Argh! Yes indeed! I don't know why I missed that (other than I am porting this from a downstream fix I did which is on LLVM 17 where the fix is in a separate function that computes the immediate and is called in both places). |
@@ -3,7 +3,7 @@ | |||
define ptr @func(ptr %s, i32 %_c, ptr %incdec.ptr, i1 %0, i8 %conv14) #0 { | |||
; RV32-LABEL: func: | |||
; RV32: # %bb.0: # %entry | |||
; RV32-NEXT: cm.push {ra, s0-s1}, -24 | |||
; RV32-NEXT: cm.push {ra, s0-s1}, -32 | |||
; RV32-NEXT: .cfi_def_cfa_offset 24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't these cfi directives wrong, since the stack pointer was moved by 32 not 24?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They most certainly appear to be wrong.
…h/pop. This an alternative to llvm#84935 to fix the miscompile, but not be optimal. The immediate for cm.push/pop must be a multiple of 16. For RVE, it might not be. It's not easy to increase the stack size without messing up cfa directives and maybe other things. This patch rounds the stack size down to a multiple of 16 before clamping it to 48. This causes an extra addi to be emitted to handle the remainder. Once this commited, I can commit llvm#84989 to add verification for these instructions being generated with valid offsets.
…h/pop. (#86073) This an alternative to #84935 to fix the miscompile, but not be optimal. The immediate for cm.push/pop must be a multiple of 16. For RVE, it might not be. It's not easy to increase the stack size without messing up cfa directives and maybe other things. This patch rounds the stack size down to a multiple of 16 before clamping it to 48. This causes an extra addi to be emitted to handle the remainder. Once this commited, I can commit #84989 to add verification for these instructions being generated with valid offsets.
An immediate that isn't a multiple of 16 is meaningless and we should always emit the next higher multiple. Otherwise the value is simply truncated of course which leads to clobbering CSR's on the stack.