Skip to content

Update preserve_most to treat XMM registers like C #73866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 4, 2023

Conversation

sethbrenith
Copy link
Contributor

As scottmcm described, the preserve_most calling convention, as currently implemented, is a bad fit for Windows on x64. The intent of preserve_most is "to make the code in the caller as unintrusive as possible", but preserve_most causes the caller to spill and restore ten SIMD registers. It would be preferable to make preserve_most treat the XMM registers however the C calling convention does on the target operating system.

This is a breaking change, but the documentation indicates that preserve_most is still experimental, so I believe that ABI compatibility is not yet a requirement.

@llvmbot
Copy link
Member

llvmbot commented Nov 29, 2023

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-ir

Author: Seth Brenith (sethbrenith)

Changes

As scottmcm described, the preserve_most calling convention, as currently implemented, is a bad fit for Windows on x64. The intent of preserve_most is "to make the code in the caller as unintrusive as possible", but preserve_most causes the caller to spill and restore ten SIMD registers. It would be preferable to make preserve_most treat the XMM registers however the C calling convention does on the target operating system.

This is a breaking change, but the documentation indicates that preserve_most is still experimental, so I believe that ABI compatibility is not yet a requirement.


Full diff: https://github.com/llvm/llvm-project/pull/73866.diff

4 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+3-2)
  • (modified) llvm/lib/Target/X86/X86CallingConv.td (+3)
  • (modified) llvm/lib/Target/X86/X86RegisterInfo.cpp (+3-2)
  • (added) llvm/test/CodeGen/X86/preserve_mostcc64_win.ll (+91)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index e448c5ed5c5d947..29257933cc62b5b 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -362,8 +362,9 @@ added in the future:
 
     - On X86-64 the callee preserves all general purpose registers, except for
       R11 and return registers, if any. R11 can be used as a scratch register.
-      Floating-point registers (XMMs/YMMs) are not preserved and need to be
-      saved by the caller.
+      The treatment of floating-point registers (XMMs/YMMs) matches the OS's C
+      calling convention: on most platforms, they are not preserved and need to
+      be saved by the caller, but on Windows, xmm6-xmm15 are preserved.
 
     - On AArch64 the callee preserve all general purpose registers, except X0-X8
       and X16-X18.
diff --git a/llvm/lib/Target/X86/X86CallingConv.td b/llvm/lib/Target/X86/X86CallingConv.td
index 27e4fe0cb7a02fa..16014d6a2f60241 100644
--- a/llvm/lib/Target/X86/X86CallingConv.td
+++ b/llvm/lib/Target/X86/X86CallingConv.td
@@ -1151,6 +1151,9 @@ def CSR_64_CXX_TLS_Darwin_ViaCopy : CalleeSavedRegs<(sub CSR_64_TLS_Darwin, RBP)
 def CSR_64_RT_MostRegs : CalleeSavedRegs<(add CSR_64, RAX, RCX, RDX, RSI, RDI,
                                               R8, R9, R10)>;
 
+def CSR_Win64_RT_MostRegs : CalleeSavedRegs<(add CSR_64_RT_MostRegs,
+                                                 (sequence "XMM%u", 6, 15))>;
+
 // All registers - except r11 and return registers.
 def CSR_64_RT_AllRegs     : CalleeSavedRegs<(add CSR_64_RT_MostRegs,
                                                  (sequence "XMM%u", 0, 15))>;
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.cpp b/llvm/lib/Target/X86/X86RegisterInfo.cpp
index 379a9d448a963ca..8e162d40c2b0efa 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.cpp
+++ b/llvm/lib/Target/X86/X86RegisterInfo.cpp
@@ -310,7 +310,8 @@ X86RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
       return CSR_64_AllRegs_AVX_SaveList;
     return CSR_64_AllRegs_SaveList;
   case CallingConv::PreserveMost:
-    return CSR_64_RT_MostRegs_SaveList;
+    return IsWin64 ? CSR_Win64_RT_MostRegs_SaveList :
+                     CSR_64_RT_MostRegs_SaveList;
   case CallingConv::PreserveAll:
     if (HasAVX)
       return CSR_64_RT_AllRegs_AVX_SaveList;
@@ -431,7 +432,7 @@ X86RegisterInfo::getCallPreservedMask(const MachineFunction &MF,
       return CSR_64_AllRegs_AVX_RegMask;
     return CSR_64_AllRegs_RegMask;
   case CallingConv::PreserveMost:
-    return CSR_64_RT_MostRegs_RegMask;
+    return IsWin64 ? CSR_Win64_RT_MostRegs_RegMask : CSR_64_RT_MostRegs_RegMask;
   case CallingConv::PreserveAll:
     if (HasAVX)
       return CSR_64_RT_AllRegs_AVX_RegMask;
diff --git a/llvm/test/CodeGen/X86/preserve_mostcc64_win.ll b/llvm/test/CodeGen/X86/preserve_mostcc64_win.ll
new file mode 100644
index 000000000000000..7042a2e77da1b6a
--- /dev/null
+++ b/llvm/test/CodeGen/X86/preserve_mostcc64_win.ll
@@ -0,0 +1,91 @@
+; RUN: sed -e "s/RETTYPE/void/;s/RETVAL//" %s | llc -mtriple=x86_64-win32 -mcpu=corei7 | FileCheck --check-prefixes=ALL,VOID %s
+; RUN: sed -e "s/RETTYPE/i32/;s/RETVAL/undef/" %s | llc -mtriple=x86_64-win32 -mcpu=corei7 | FileCheck --check-prefixes=ALL,INT %s
+; RUN: sed -e "s/RETTYPE/\{i64\,i64\}/;s/RETVAL/undef/" %s | llc -mtriple=x86_64-win32 -mcpu=corei7 | FileCheck --check-prefixes=ALL,INT128 %s
+
+; Every GPR should be saved, except r11 and return registers.
+; XMM registers 6-15 should also be saved.
+define preserve_mostcc RETTYPE @preserve_mostcc1(i64, i64, double, double) nounwind {
+entry:
+;ALL-LABEL:   preserve_mostcc1
+;ALL:         pushq %r10
+;ALL-NEXT:    pushq %r9
+;ALL-NEXT:    pushq %r8
+;ALL-NEXT:    pushq %rdi
+;ALL-NEXT:    pushq %rsi
+;VOID-NEXT:   pushq %rdx
+;INT-NEXT:    pushq %rdx
+;INT128-NOT:  pushq %rdx
+;ALL-NEXT:    pushq %rcx
+;VOID-NEXT:   pushq %rax
+;INT-NOT:     pushq %rax
+;INT128-NOT:  pushq %rax
+;ALL-NEXT:    pushq %rbp
+;ALL-NEXT:    pushq %r15
+;ALL-NEXT:    pushq %r14
+;ALL-NEXT:    pushq %r13
+;ALL-NEXT:    pushq %r12
+;ALL-NEXT:    pushq %rbx
+;ALL:         movaps %xmm15
+;ALL-NEXT:    movaps %xmm14
+;ALL-NEXT:    movaps %xmm13
+;ALL-NEXT:    movaps %xmm12
+;ALL-NEXT:    movaps %xmm11
+;ALL-NEXT:    movaps %xmm10
+;ALL-NEXT:    movaps %xmm9
+;ALL-NEXT:    movaps %xmm8
+;ALL-NEXT:    movaps %xmm7
+;ALL-NEXT:    movaps %xmm6
+;ALL-NOT:     movaps %xmm5
+;ALL-NOT:     movaps %xmm4
+;ALL-NOT:     movaps %xmm3
+;ALL-NOT:     movaps %xmm2
+;ALL-NOT:     movaps %xmm1
+;ALL-NOT:     movaps %xmm0
+;ALL-NOT:     movaps {{.*}} %xmm0
+;ALL-NOT:     movaps {{.*}} %xmm1
+;ALL-NOT:     movaps {{.*}} %xmm2
+;ALL-NOT:     movaps {{.*}} %xmm3
+;ALL-NOT:     movaps {{.*}} %xmm4
+;ALL-NOT:     movaps {{.*}} %xmm5
+;ALL:         movaps {{.*}} %xmm6
+;ALL-NEXT:    movaps {{.*}} %xmm7
+;ALL-NEXT:    movaps {{.*}} %xmm8
+;ALL-NEXT:    movaps {{.*}} %xmm9
+;ALL-NEXT:    movaps {{.*}} %xmm10
+;ALL-NEXT:    movaps {{.*}} %xmm11
+;ALL-NEXT:    movaps {{.*}} %xmm12
+;ALL-NEXT:    movaps {{.*}} %xmm13
+;ALL-NEXT:    movaps {{.*}} %xmm14
+;ALL-NEXT:    movaps {{.*}} %xmm15
+;ALL:         popq    %rbx
+;ALL-NEXT:    popq    %r12
+;ALL-NEXT:    popq    %r13
+;ALL-NEXT:    popq    %r14
+;ALL-NEXT:    popq    %r15
+;ALL-NEXT:    popq    %rbp
+;VOID-NEXT:   popq    %rax
+;INT-NOT:     popq    %rax
+;INT128-NOT:  popq    %rax
+;ALL-NEXT:    popq    %rcx
+;VOID-NEXT:   popq    %rdx
+;INT-NEXT:    popq    %rdx
+;INT128-NOT:  popq    %rdx
+;ALL-NEXT:    popq    %rsi
+;ALL-NEXT:    popq    %rdi
+;ALL-NEXT:    popq    %r8
+;ALL-NEXT:    popq    %r9
+;ALL-NEXT:    popq    %r10
+  call void asm sideeffect "", "~{rax},~{rbx},~{rcx},~{rdx},~{rsi},~{rdi},~{r8},~{r9},~{r10},~{r11},~{r12},~{r13},~{r14},~{r15},~{rbp},~{xmm0},~{xmm1},~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15}"()
+  ret RETTYPE RETVAL
+}
+
+; Make sure XMMs are not saved before the call
+declare preserve_mostcc RETTYPE @foo(i64, i64, double, double)
+define void @preserve_mostcc2() nounwind {
+entry:
+;ALL-LABEL: preserve_mostcc2
+;ALL-NOT:   movaps
+;ALL-NOT:   {{.*xmm[0-1,4-9].*}}
+  call preserve_mostcc RETTYPE @foo(i64 1, i64 2, double 3.0, double 4.0)
+  ret void
+}

Copy link

github-actions bot commented Nov 29, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sethbrenith
Copy link
Contributor Author

Thanks for reviewing, @rnk! Would you please advise on the next steps, since I'm unfamiliar with the code review process for LLVM? A few specific questions:

  • Would you like me to wait for reviews from others?
  • How can I trigger a new buildkite run? Other pull requests from about the same time encountered the same failure, so I assume it's not related to my change.
  • Who completes pull requests? It looks like I don't have permission to do so.

@rnk rnk merged commit f6c7bae into llvm:main Dec 4, 2023
@rnk
Copy link
Collaborator

rnk commented Dec 4, 2023

I'm comfortable approving and merging this pull request, so I went ahead and did that.

For LLVM committers, I would generally prefer to let them be the ones to push the merge button, so I didn't do that earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants