Skip to content

[clang] Pass fp128 indirectly and return in xmm0 on Windows #115052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 6, 2025

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented Nov 5, 2024

Clang currently passes and returns __float128 in vector registers on
MinGW targets, which is LLVM's default ABI for fp128. However, the
Windows x86-64 calling convention 1 states the following:

__m128 types, arrays, and strings are never passed by immediate
value. Instead, a pointer is passed to memory allocated by the
caller. Structs and unions of size 8, 16, 32, or 64 bits, and __m64
types, are passed as if they were integers of the same size. Structs
or unions of other sizes are passed as a pointer to memory allocated
by the caller. For these aggregate types passed as a pointer,
including __m128, the caller-allocated temporary memory must be
16-byte aligned.

Based on the above it sounds like __float128 should be passed
indirectly. Thus, change f128 passing to use the stack and make the
return in xmm0 explicit. This is the identical to i128, and passing is
the same as GCC.

Regarding return values, the documentation states:

A scalar return value that can fit into 64 bits, including the __m64
type, is returned through RAX. Non-scalar types including floats,
doubles, and vector types such as __m128, __m128i, __m128d are
returned in XMM0.

This makes it sound like it should be acceptable to return __float128
in xmm0; however, GCC returns __float128 on the stack. That above ABI
statement as well as consistency with i128 (which is returned in xmm0)
mean that it would likely be better for GCC to change its return ABI to
match Clang rather than the other way around, so that portion is left
as-is.

Clang's MSVC targets do not support __float128 or _Float128, but
these changes would also apply there if it is eventually enabled.

With 2 which should land around the same time, LLVM will also
implement this ABI so it is not technically necessary for Clang to make
a change here as well. This is sill done in order to be consistent with
other types, and to allow calling convention-aware optimizations at all
available optimization layers (@rnk mentioned possible reuse of stack
arguments). An added benefit is readibility of the LLVM IR since it more
accurately reflects what the lowered assembly does.

Copy link

github-actions bot commented Nov 5, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@tgross35
Copy link
Contributor Author

tgross35 commented Nov 5, 2024

Cc @beetrees and @wesleywiser

See also some context discussion at the Rust Zulip https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/MSVC.20.60f16.60.20and.20.60f128.60.20ABI

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. labels Nov 5, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 5, 2024

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang

Author: Trevor Gross (tgross35)

Changes

Clang currently passes and returns __float128 in vector registers on MinGW targets. However, the Windows x86-64 calling convention 1 states the following:

> __m128 types, arrays, and strings are never passed by immediate value. Instead, a pointer is passed to memory allocated by the caller. Structs and unions of size 8, 16, 32, or 64 bits, and __m64 types, are passed as if they were integers of the same size. Structs or unions of other sizes are passed as a pointer to memory allocated by the caller. For these aggregate types passed as a pointer, including __m128, the caller-allocated temporary memory must be 16-byte aligned.

Based on the above it sounds like __float128 should be passed indirectly; this is what MinGW GCC already does, so change Clang to match. Passing by value causes problems with varargs. E.g. the below completes successfully when built with GCC but has a runtime crash when built with Clang:

void va_f128(int count, ...) {
    va_list args;
    va_start(args, count);
    __float128 val = va_arg(args, __float128);
    va_end(args);
}

int main() {
    va_f128(0, 0.0);
}

This patch fixes the above. It also resolves crashes when calling GCC-built f128 libcalls.

Regarding return values, the documentation states:

> A scalar return value that can fit into 64 bits, including the __m64 type, is returned through RAX. Non-scalar types including floats, doubles, and vector types such as __m128, __m128i, __m128d are returned in XMM0.

This makes it sound like it should be acceptable to return __float128 in XMM0. However, GCC returns __float128 on the stack, so do the same here to be consistent.

Clang's MSVC targets do not support __float128 or _Float128, but these changes would also apply there if it is eventually enabled.


Full diff: https://github.com/llvm/llvm-project/pull/115052.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/Targets/X86.cpp (+5)
  • (added) clang/test/CodeGen/win64-fp128.c (+20)
diff --git a/clang/lib/CodeGen/Targets/X86.cpp b/clang/lib/CodeGen/Targets/X86.cpp
index 7f73bf2a65266e..16656be14d8353 100644
--- a/clang/lib/CodeGen/Targets/X86.cpp
+++ b/clang/lib/CodeGen/Targets/X86.cpp
@@ -3367,6 +3367,11 @@ ABIArgInfo WinX86_64ABIInfo::classify(QualType Ty, unsigned &FreeSSERegs,
       return ABIArgInfo::getDirect(llvm::FixedVectorType::get(
           llvm::Type::getInt64Ty(getVMContext()), 2));
 
+    case BuiltinType::Float128:
+      // f128 is too large to fit in integer registers so the Windows ABI
+      // require it be passed on the stack. GCC does the same.
+      return ABIArgInfo::getIndirect(Align, /*ByVal=*/false);
+
     default:
       break;
     }
diff --git a/clang/test/CodeGen/win64-fp128.c b/clang/test/CodeGen/win64-fp128.c
new file mode 100644
index 00000000000000..bfb903709397c3
--- /dev/null
+++ b/clang/test/CodeGen/win64-fp128.c
@@ -0,0 +1,20 @@
+// RUN: %clang_cc1 -triple x86_64-windows-gnu -emit-llvm -o - %s \
+// RUN:    | FileCheck %s --check-prefix=GNU64
+// __float128 is unsupported on MSVC
+
+__float128 fp128_ret(void) { return 0; }
+// GNU64: define dso_local void @fp128_ret(ptr dead_on_unwind noalias writable sret(fp128) align 16 %agg.result)
+
+__float128 fp128_args(__float128 a, __float128 b) { return a * b; }
+// GNU64: define dso_local void @fp128_args(ptr dead_on_unwind noalias writable sret(fp128) align 16 %agg.result, ptr noundef %0, ptr noundef %1)
+
+void fp128_vararg(int a, ...) {
+  // GNU64-LABEL: define dso_local void @fp128_vararg
+  __builtin_va_list ap;
+  __builtin_va_start(ap, a);
+  __float128 i = __builtin_va_arg(ap, __float128);
+  // movaps  xmm0, xmmword ptr [rax]
+  // GNU64: load ptr, ptr
+  // GNU64: load fp128, ptr
+  __builtin_va_end(ap);
+}

@nikic nikic requested review from rnk and mstorsjo November 5, 2024 19:53
@tgross35
Copy link
Contributor Author

tgross35 commented Nov 5, 2024

It seems like arguments aren't actually getting passed indirectly to libcalls. Simple test program

#include <stdint.h>
#include <stdio.h>

union ty128 {
  struct { uint64_t hi, lo; } u64x2;
  __float128 f128;
};

void f128_add(__float128 a, __float128 b) {
    union ty128 cvt;
    cvt.f128 = a * b;
    printf("0x%016llx%016llx\n", cvt.u64x2.lo, cvt.u64x2.hi);
}

int main() {
    __float128 fa, fb;

    fa = 122134.345678901234;
    fb = 78.9012345678901234;

    f128_add(fa, fb);
}

Checking right before the call to __multf3. Built with GCC rcx points to a return slot on the stack, rdx and r8 point to the two float values (0x400f... and 0x4005...):

(gdb) x/2gx $rcx
0x7ffffcb30:    0x00000007ffffcbf0      0x00000007ffffcc85
(gdb) x/2gx $rdx
0x7ffffcb20:    0xb000000000000000      0x400fdd16587e6997
(gdb) x/2gx $r8
0x7ffffcb10:    0x8000000000000000      0x40053b9add3c0c73

Stdout correctly prints 0x401626158d507d67aaa9b63c8017f400.

Built using this patch, it looks like Clang is putting the float arguments in $rcx and $rdx and not leaving a return slot pointer:

(gdb) x/2gx $rcx
0x5ffe70:       0xb000000000000000      0x400fdd16587e6997
(gdb) x/2gx $rdx
0x5ffe60:       0x8000000000000000      0x40053b9add3c0c73
(gdb) x/2gx $r8
0x622720:       0x0000000000622988      0x00000000006229a7

And output is meaningless.

Is libcall ABI handled elsewhere?

@nikic
Copy link
Contributor

nikic commented Nov 5, 2024

@tgross35 Libcalls are handled by the backend. I think you'd have to modify the CC_X86_Win64_C / RetCC_X86_Win64_C calling conventions in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86CallingConv.td for that.

@rnk
Copy link
Collaborator

rnk commented Nov 5, 2024

To fix things for libcalls, you might have to make other backend changes. Gnarly things like X86TargetLowering::LowerWin64_i128OP in X86ISelLowering.cpp come to mind.

@@ -3367,6 +3367,11 @@ ABIArgInfo WinX86_64ABIInfo::classify(QualType Ty, unsigned &FreeSSERegs,
return ABIArgInfo::getDirect(llvm::FixedVectorType::get(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment right here implies GCC returns i128 in XMM0. Are we sure f128 will always be passed indirectly? Going with a straightforward interpretation of the ABI as documented by Microsoft implies it should be passed indirectly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCC currently always passes and returns f128 indirectly on MinGW. Reading https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 makes me think that it would actually be more accurate to pass indirectly but return in XMM0, similar to what i128 is doing - I brought this up in my most recent comment at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115054 (disregard the top post, I was originally asking GCC to pass in XMM which wasn't correct).

What would be best to do here?

@tgross35 tgross35 force-pushed the windows-f128-abi branch 2 times, most recently from 01658e5 to e73d83e Compare February 26, 2025 09:00
tgross35 added a commit to tgross35/llvm-project that referenced this pull request Feb 26, 2025
LLVM expects `__float128` to be both passed and returned in xmm
registers on Windows. However, this disagrees with the Windowx x86-64
calling convention [1], which indicates values larger than 64 bits
should be passed indirectly.

Update LLVM's libcall calling convention to pass `fp128` directly.
Returning in xmm0 is unchanged since this seems like a reasonable
extrapolation of the ABI.

With this change, the calling convention for `i128` and `f128` becomes
the same.

This corresponds to the frontend change in [2].

[1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
[2]: llvm#115052
tgross35 added a commit to tgross35/llvm-project that referenced this pull request Feb 26, 2025
LLVM expects `__float128` to be both passed and returned in xmm
registers on Windows. However, this disagrees with the Windowx x86-64
calling convention [1], which indicates values larger than 64 bits
should be passed indirectly.

Update LLVM's libcall calling convention to pass `fp128` directly.
Returning in xmm0 is unchanged since this seems like a reasonable
extrapolation of the ABI.

With this change, the calling convention for `i128` and `f128` becomes
the same.

This corresponds to the frontend change in [2].

[1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
[2]: llvm#115052
@tgross35 tgross35 force-pushed the windows-f128-abi branch 2 times, most recently from 07e0c4d to 06ce65a Compare February 26, 2025 21:58
tgross35 added a commit to tgross35/llvm-project that referenced this pull request Feb 28, 2025
LLVM currently expects `__float128` to be both passed and returned in
xmm registers on Windows. However, this disagrees with the Windows
x86-64 calling convention [1], which indicates values larger than 64
bits should be passed indirectly.

Update LLVM's libcall calling convention to pass `fp128` directly.
Returning in xmm0 is unchanged since this seems like a reasonable
extrapolation of the ABI. With this patch, the calling convention for
`i128` and `f128` is the same.

GCC passes `__float128` indirectly, which this also matches. However, it
also returns indirectly, which is not done here. I intend to attempt a
GCC change to also return in `xmm0`, given the consistency with `i128`.

This corresponds to the frontend change in [2], see more details there.

[1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
[2]: llvm#115052
tgross35 added a commit to tgross35/llvm-project that referenced this pull request Feb 28, 2025
LLVM currently expects `__float128` to be both passed and returned in
xmm registers on Windows. However, this disagrees with the Windows
x86-64 calling convention [1], which indicates values larger than 64
bits should be passed indirectly.

Update LLVM's libcall calling convention to pass `fp128` directly.
Returning in xmm0 is unchanged since this seems like a reasonable
extrapolation of the ABI. With this patch, the calling convention for
`i128` and `f128` is the same.

GCC passes `__float128` indirectly, which this also matches. However, it
also returns indirectly, which is not done here. I intend to attempt a
GCC change to also return in `xmm0` rather than making that change here,
given the consistency with `i128`.

This corresponds to the frontend change in [2], see more details there.

[1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
[2]: llvm#115052
@tgross35 tgross35 changed the title [clang] Always pass fp128 arguments indirectly on Windows [clang] Always pass fp128 arguments indirectly on Windows Feb 28, 2025
@tgross35 tgross35 changed the title [clang] Always pass fp128 arguments indirectly on Windows [clang] Pass fp128 indirectly and return in xmm0 on Windows Feb 28, 2025
tgross35 added a commit to tgross35/llvm-project that referenced this pull request Feb 28, 2025
LLVM currently expects `__float128` to be both passed and returned in
xmm registers on Windows. However, this disagrees with the Windows
x86-64 calling convention [1], which indicates values larger than 64
bits should be passed indirectly.

Update LLVM's default calling convention to pass `fp128` directly.
Returning in xmm0 is unchanged since this seems like a reasonable
extrapolation of the ABI. With this patch, the calling convention for
`i128` and `f128` is the same.

GCC passes `__float128` indirectly, which this also matches. However, it
also returns indirectly, which is not done here. I intend to attempt a
GCC change to also return in `xmm0` rather than making that change here,
given the consistency with `i128`.

This corresponds to the frontend change in [2], see more details there.

[1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
[2]: llvm#115052

(cherry picked from commit b6c7942)
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 28, 2025
…, r=<try>

[do not merge] Windows f128 abi experiment

Running tests with llvm/llvm-project#115052 and llvm/llvm-project#128848.

r? `@ghost`

try-job: dist-x86_64-msvc
try-job: dist-x86_64-mingw
try-job: x86_64-msvc-1
try-job: x86_64-msvc-2
try-job: x86_64-mingw-1
try-job: x86_64-mingw-2
try-job: x86_64-mingw-3
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 28, 2025
…, r=<try>

[do not merge] Windows f128 abi experiment

Running tests with llvm/llvm-project#115052 and llvm/llvm-project#128848.

r? `@ghost`

try-job: dist-x86_64-msvc
try-job: dist-x86_64-mingw
try-job: x86_64-msvc-1
try-job: x86_64-msvc-2
try-job: x86_64-mingw-1
try-job: x86_64-mingw-2
Add a test based on `win64-i128.c` with the current behavior of
`__float128`.
@tgross35 tgross35 force-pushed the windows-f128-abi branch 2 times, most recently from d695313 to 4f07687 Compare February 28, 2025 10:17
@tgross35
Copy link
Contributor Author

I updated this PR to pass indirectly and make the xmm0 return explicit, identical to i128. This does not match GCC exactly since GCC's __float128 returns on the stack, unfortunately meaning cross-implementation is still broken. I don't think that needs block the changes to Clang's here however, I plan to send patches to GCC returning in xmm0, which will match the behavior in this PR.

This should be ready for a review. Libcall changes are in #128848, that should land at the same time.

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 28, 2025
…, r=<try>

[do not merge] Windows f128 abi experiment

Running tests with llvm/llvm-project#115052 and llvm/llvm-project#128848.

r? `@ghost`

try-job: dist-x86_64-msvc
try-job: dist-x86_64-mingw
try-job: x86_64-msvc-1
try-job: x86_64-msvc-2
try-job: x86_64-mingw-1
try-job: x86_64-mingw-2
@tgross35 tgross35 requested a review from rnk February 28, 2025 23:08
@rnk
Copy link
Collaborator

rnk commented Mar 1, 2025

If you take the two PRs together, we're going to implement the indirection logic at both the frontend and backend level. If we do it in the backend, I think it's technically necessary to do it in the frontend, but it is consistent with how we handle all other types. I'd like to see that explained in the commit message: why should this be in the frontend as well?

To answer that question, there are tradeoffs, but it's generally good for memory optimizations to be able to observe ABI details. For example, the backend lowering creates extra stack objects if we need to pass the same fp128 argument twice to two functions, whereas mid-level optimizations might be able to remove that.

Otherwise, I think this is good to go. I suppose Rust folks are watching this PR, otherwise, I'd say loop them in.

@tgross35
Copy link
Contributor Author

tgross35 commented Mar 1, 2025

Thanks for answering that question, I wouldn't have had a good answer outside of consistency. Does mid-level optimizations refer to optimizations done in Clang rather than in LLVM?

I updated the message. Somebody will need to land this for me, the two commits should come separate (first is NFC).

Otherwise, I think this is good to go. I suppose Rust folks are watching this PR, otherwise, I'd say loop them in.

For future reference that's mostly me for f16/f128, I'll update our frontend after this lands.

Clang currently passes and returns `__float128` in vector registers on
MinGW targets, which is LLVM's default ABI for `fp128`. However, the
Windows x86-64 calling convention [1] states the following:

    __m128 types, arrays, and strings are never passed by immediate
    value. Instead, a pointer is passed to memory allocated by the
    caller. Structs and unions of size 8, 16, 32, or 64 bits, and __m64
    types, are passed as if they were integers of the same size. Structs
    or unions of other sizes are passed as a pointer to memory allocated
    by the caller. For these aggregate types passed as a pointer,
    including __m128, the caller-allocated temporary memory must be
    16-byte aligned.

Based on the above it sounds like `__float128` should be passed
indirectly. Thus, change `f128` passing to use the stack and make the
return in xmm0 explicit. This is the identical to `i128`, and passing is
the same as GCC.

Regarding return values, the documentation states:

    A scalar return value that can fit into 64 bits, including the __m64
    type, is returned through RAX. Non-scalar types including floats,
    doubles, and vector types such as __m128, __m128i, __m128d are
    returned in XMM0.

This makes it sound like it should be acceptable to return `__float128`
in xmm0; however, GCC returns `__float128` on the stack. That above ABI
statement as well as consistency with `i128` (which is returned in xmm0)
mean that it would likely be better for GCC to change its return ABI to
match Clang rather than the other way around, so that portion is left
as-is.

Clang's MSVC targets do not support `__float128` or `_Float128`, but
these changes would also apply there if it is eventually enabled.

With [2] which should land around the same time, LLVM will also
implement this ABI so it is not technically necessary for Clang to make
a change here as well. This is sill done in order to be consistent with
other types, and to allow calling convention-aware optimizations at all
available optimization layers (rnk mentioned possible reuse of stack
arguments). An added benefit is readibility of the LLVM IR since it more
accurately reflects what the lowered assembly does.

[1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
[2]: llvm#128848
@tgross35
Copy link
Contributor Author

tgross35 commented Mar 6, 2025

@rnk (or anyone) would you be able to land this?

@phoebewang phoebewang merged commit b941d90 into llvm:main Mar 6, 2025
11 checks passed
Copy link

github-actions bot commented Mar 6, 2025

@tgross35 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@tgross35 tgross35 deleted the windows-f128-abi branch March 6, 2025 09:48
tgross35 added a commit to tgross35/llvm-project that referenced this pull request Mar 6, 2025
LLVM currently expects `__float128` to be both passed and returned in
xmm registers on Windows. However, this disagrees with the Windows
x86-64 calling convention [1], which indicates values larger than 64
bits should be passed indirectly.

Update LLVM's default Windows calling convention to pass `fp128`
directly.  Returning in xmm0 is unchanged since this seems like a
reasonable extrapolation of the ABI. With this patch, the calling
convention for `i128` and `f128` is the same.

GCC passes `__float128` indirectly, which this also matches. However, it
also returns indirectly, which is not done here. I intend to attempt a
GCC change to also return in `xmm0` rather than making that change here,
given the consistency with `i128`.

This corresponds to the frontend change in [2], see more details there.

[1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
[2]: llvm#115052
phoebewang pushed a commit that referenced this pull request Mar 6, 2025
LLVM currently expects `__float128` to be both passed and returned in
xmm registers on Windows. However, this disagrees with the Windows
x86-64 calling convention [1], which indicates values larger than 64
bits should be passed indirectly.

Update LLVM's default Windows calling convention to pass `fp128`
directly.  Returning in xmm0 is unchanged since this seems like a
reasonable extrapolation of the ABI. With this patch, the calling
convention for `i128` and `f128` is the same.

GCC passes `__float128` indirectly, which this also matches. However, it
also returns indirectly, which is not done here. I intend to attempt a
GCC change to also return in `xmm0` rather than making that change here,
given the consistency with `i128`.

This corresponds to the frontend change in [2], see more details there.

[1]:
https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
[2]: #115052
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 6, 2025
LLVM currently expects `__float128` to be both passed and returned in
xmm registers on Windows. However, this disagrees with the Windows
x86-64 calling convention [1], which indicates values larger than 64
bits should be passed indirectly.

Update LLVM's default Windows calling convention to pass `fp128`
directly.  Returning in xmm0 is unchanged since this seems like a
reasonable extrapolation of the ABI. With this patch, the calling
convention for `i128` and `f128` is the same.

GCC passes `__float128` indirectly, which this also matches. However, it
also returns indirectly, which is not done here. I intend to attempt a
GCC change to also return in `xmm0` rather than making that change here,
given the consistency with `i128`.

This corresponds to the frontend change in [2], see more details there.

[1]:
https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
[2]: llvm/llvm-project#115052
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
)

Clang currently passes and returns `__float128` in vector registers on
MinGW targets, which is LLVM's default ABI for `fp128`. However, the
Windows x86-64 calling convention [1] states the following:

    __m128 types, arrays, and strings are never passed by immediate
    value. Instead, a pointer is passed to memory allocated by the
    caller. Structs and unions of size 8, 16, 32, or 64 bits, and __m64
    types, are passed as if they were integers of the same size. Structs
    or unions of other sizes are passed as a pointer to memory allocated
    by the caller. For these aggregate types passed as a pointer,
    including __m128, the caller-allocated temporary memory must be
    16-byte aligned.

Based on the above it sounds like `__float128` should be passed
indirectly. Thus, change `f128` passing to use the stack and make the
return in xmm0 explicit. This is the identical to `i128`, and passing is
the same as GCC.

Regarding return values, the documentation states:

    A scalar return value that can fit into 64 bits, including the __m64
    type, is returned through RAX. Non-scalar types including floats,
    doubles, and vector types such as __m128, __m128i, __m128d are
    returned in XMM0.

This makes it sound like it should be acceptable to return `__float128`
in xmm0; however, GCC returns `__float128` on the stack. That above ABI
statement as well as consistency with `i128` (which is returned in xmm0)
mean that it would likely be better for GCC to change its return ABI to
match Clang rather than the other way around, so that portion is left
as-is.

Clang's MSVC targets do not support `__float128` or `_Float128`, but
these changes would also apply there if it is eventually enabled.

With [2] which should land around the same time, LLVM will also
implement this ABI so it is not technically necessary for Clang to make
a change here as well. This is sill done in order to be consistent with
other types, and to allow calling convention-aware optimizations at all
available optimization layers (@rnk mentioned possible reuse of stack
arguments). An added benefit is readibility of the LLVM IR since it more
accurately reflects what the lowered assembly does.

[1]:
https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
[2]: llvm#128848
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
LLVM currently expects `__float128` to be both passed and returned in
xmm registers on Windows. However, this disagrees with the Windows
x86-64 calling convention [1], which indicates values larger than 64
bits should be passed indirectly.

Update LLVM's default Windows calling convention to pass `fp128`
directly.  Returning in xmm0 is unchanged since this seems like a
reasonable extrapolation of the ABI. With this patch, the calling
convention for `i128` and `f128` is the same.

GCC passes `__float128` indirectly, which this also matches. However, it
also returns indirectly, which is not done here. I intend to attempt a
GCC change to also return in `xmm0` rather than making that change here,
given the consistency with `i128`.

This corresponds to the frontend change in [2], see more details there.

[1]:
https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
[2]: llvm#115052
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants