Skip to content

[AArch64] Disable red-zone when lowering Q-reg copy through memory. #94962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -431,8 +431,16 @@ bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const {
const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
uint64_t NumBytes = AFI->getLocalStackSize();

// If neither NEON or SVE are available, a COPY from one Q-reg to
// another requires a spill -> reload sequence. We can do that
// using a pre-decrementing store/post-decrementing load, but
// if we do so, we can't use the Red Zone.
bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
!Subtarget.isNeonAvailable() &&
!Subtarget.hasSVE();

return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
getSVEStackSize(MF));
getSVEStackSize(MF) || LowerQRegCopyThroughMem);
}

/// hasFP - Return true if the specified function should have a dedicated frame
Expand Down
13 changes: 13 additions & 0 deletions llvm/test/CodeGen/AArch64/arm64-redzone.ll
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,16 @@ define i32 @foo(i32 %a, i32 %b) nounwind ssp {
%tmp2 = load i32, ptr %x, align 4
ret i32 %tmp2
}

; We disable red-zone if NEON is available because copies of Q-regs
; require a spill/fill and dynamic allocation. But we only need to do
; this when FP registers are enabled.
define void @bar(fp128 %f) "target-features"="-fp-armv8" {
; CHECK-LABEL: bar:
; CHECK: // %bb.0:
; CHECK-NEXT: stp x0, x1, [sp, #-16]
; CHECK-NEXT: ret
%ptr = alloca fp128
store fp128 %f, ptr %ptr
ret void
}
Loading