-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[clang] Pass fp128 indirectly and return in xmm0 on Windows #115052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
Cc @beetrees and @wesleywiser See also some context discussion at the Rust Zulip https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/MSVC.20.60f16.60.20and.20.60f128.60.20ABI |
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Trevor Gross (tgross35) ChangesClang currently passes and returns > __m128 types, arrays, and strings are never passed by immediate value. Instead, a pointer is passed to memory allocated by the caller. Structs and unions of size 8, 16, 32, or 64 bits, and __m64 types, are passed as if they were integers of the same size. Structs or unions of other sizes are passed as a pointer to memory allocated by the caller. For these aggregate types passed as a pointer, including __m128, the caller-allocated temporary memory must be 16-byte aligned. Based on the above it sounds like void va_f128(int count, ...) {
va_list args;
va_start(args, count);
__float128 val = va_arg(args, __float128);
va_end(args);
}
int main() {
va_f128(0, 0.0);
} This patch fixes the above. It also resolves crashes when calling GCC-built f128 libcalls. Regarding return values, the documentation states: > A scalar return value that can fit into 64 bits, including the __m64 type, is returned through RAX. Non-scalar types including floats, doubles, and vector types such as __m128, __m128i, __m128d are returned in XMM0. This makes it sound like it should be acceptable to return Clang's MSVC targets do not support Full diff: https://github.com/llvm/llvm-project/pull/115052.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/Targets/X86.cpp b/clang/lib/CodeGen/Targets/X86.cpp
index 7f73bf2a65266e..16656be14d8353 100644
--- a/clang/lib/CodeGen/Targets/X86.cpp
+++ b/clang/lib/CodeGen/Targets/X86.cpp
@@ -3367,6 +3367,11 @@ ABIArgInfo WinX86_64ABIInfo::classify(QualType Ty, unsigned &FreeSSERegs,
return ABIArgInfo::getDirect(llvm::FixedVectorType::get(
llvm::Type::getInt64Ty(getVMContext()), 2));
+ case BuiltinType::Float128:
+ // f128 is too large to fit in integer registers so the Windows ABI
+ // require it be passed on the stack. GCC does the same.
+ return ABIArgInfo::getIndirect(Align, /*ByVal=*/false);
+
default:
break;
}
diff --git a/clang/test/CodeGen/win64-fp128.c b/clang/test/CodeGen/win64-fp128.c
new file mode 100644
index 00000000000000..bfb903709397c3
--- /dev/null
+++ b/clang/test/CodeGen/win64-fp128.c
@@ -0,0 +1,20 @@
+// RUN: %clang_cc1 -triple x86_64-windows-gnu -emit-llvm -o - %s \
+// RUN: | FileCheck %s --check-prefix=GNU64
+// __float128 is unsupported on MSVC
+
+__float128 fp128_ret(void) { return 0; }
+// GNU64: define dso_local void @fp128_ret(ptr dead_on_unwind noalias writable sret(fp128) align 16 %agg.result)
+
+__float128 fp128_args(__float128 a, __float128 b) { return a * b; }
+// GNU64: define dso_local void @fp128_args(ptr dead_on_unwind noalias writable sret(fp128) align 16 %agg.result, ptr noundef %0, ptr noundef %1)
+
+void fp128_vararg(int a, ...) {
+ // GNU64-LABEL: define dso_local void @fp128_vararg
+ __builtin_va_list ap;
+ __builtin_va_start(ap, a);
+ __float128 i = __builtin_va_arg(ap, __float128);
+ // movaps xmm0, xmmword ptr [rax]
+ // GNU64: load ptr, ptr
+ // GNU64: load fp128, ptr
+ __builtin_va_end(ap);
+}
|
bacafd9
to
96f754f
Compare
It seems like arguments aren't actually getting passed indirectly to libcalls. Simple test program #include <stdint.h>
#include <stdio.h>
union ty128 {
struct { uint64_t hi, lo; } u64x2;
__float128 f128;
};
void f128_add(__float128 a, __float128 b) {
union ty128 cvt;
cvt.f128 = a * b;
printf("0x%016llx%016llx\n", cvt.u64x2.lo, cvt.u64x2.hi);
}
int main() {
__float128 fa, fb;
fa = 122134.345678901234;
fb = 78.9012345678901234;
f128_add(fa, fb);
} Checking right before the call to
Stdout correctly prints Built using this patch, it looks like Clang is putting the float arguments in
And output is meaningless. Is libcall ABI handled elsewhere? |
@tgross35 Libcalls are handled by the backend. I think you'd have to modify the CC_X86_Win64_C / RetCC_X86_Win64_C calling conventions in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86CallingConv.td for that. |
To fix things for libcalls, you might have to make other backend changes. Gnarly things like |
@@ -3367,6 +3367,11 @@ ABIArgInfo WinX86_64ABIInfo::classify(QualType Ty, unsigned &FreeSSERegs, | |||
return ABIArgInfo::getDirect(llvm::FixedVectorType::get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment right here implies GCC returns i128 in XMM0. Are we sure f128 will always be passed indirectly? Going with a straightforward interpretation of the ABI as documented by Microsoft implies it should be passed indirectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCC currently always passes and returns f128
indirectly on MinGW. Reading https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 makes me think that it would actually be more accurate to pass indirectly but return in XMM0, similar to what i128
is doing - I brought this up in my most recent comment at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115054 (disregard the top post, I was originally asking GCC to pass in XMM which wasn't correct).
What would be best to do here?
01658e5
to
e73d83e
Compare
LLVM expects `__float128` to be both passed and returned in xmm registers on Windows. However, this disagrees with the Windowx x86-64 calling convention [1], which indicates values larger than 64 bits should be passed indirectly. Update LLVM's libcall calling convention to pass `fp128` directly. Returning in xmm0 is unchanged since this seems like a reasonable extrapolation of the ABI. With this change, the calling convention for `i128` and `f128` becomes the same. This corresponds to the frontend change in [2]. [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 [2]: llvm#115052
e73d83e
to
8b646bf
Compare
LLVM expects `__float128` to be both passed and returned in xmm registers on Windows. However, this disagrees with the Windowx x86-64 calling convention [1], which indicates values larger than 64 bits should be passed indirectly. Update LLVM's libcall calling convention to pass `fp128` directly. Returning in xmm0 is unchanged since this seems like a reasonable extrapolation of the ABI. With this change, the calling convention for `i128` and `f128` becomes the same. This corresponds to the frontend change in [2]. [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 [2]: llvm#115052
07e0c4d
to
06ce65a
Compare
LLVM currently expects `__float128` to be both passed and returned in xmm registers on Windows. However, this disagrees with the Windows x86-64 calling convention [1], which indicates values larger than 64 bits should be passed indirectly. Update LLVM's libcall calling convention to pass `fp128` directly. Returning in xmm0 is unchanged since this seems like a reasonable extrapolation of the ABI. With this patch, the calling convention for `i128` and `f128` is the same. GCC passes `__float128` indirectly, which this also matches. However, it also returns indirectly, which is not done here. I intend to attempt a GCC change to also return in `xmm0`, given the consistency with `i128`. This corresponds to the frontend change in [2], see more details there. [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 [2]: llvm#115052
LLVM currently expects `__float128` to be both passed and returned in xmm registers on Windows. However, this disagrees with the Windows x86-64 calling convention [1], which indicates values larger than 64 bits should be passed indirectly. Update LLVM's libcall calling convention to pass `fp128` directly. Returning in xmm0 is unchanged since this seems like a reasonable extrapolation of the ABI. With this patch, the calling convention for `i128` and `f128` is the same. GCC passes `__float128` indirectly, which this also matches. However, it also returns indirectly, which is not done here. I intend to attempt a GCC change to also return in `xmm0` rather than making that change here, given the consistency with `i128`. This corresponds to the frontend change in [2], see more details there. [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 [2]: llvm#115052
06ce65a
to
a95f02c
Compare
fp128
arguments indirectly on Windows
a95f02c
to
afaf2cb
Compare
fp128
arguments indirectly on WindowsLLVM currently expects `__float128` to be both passed and returned in xmm registers on Windows. However, this disagrees with the Windows x86-64 calling convention [1], which indicates values larger than 64 bits should be passed indirectly. Update LLVM's default calling convention to pass `fp128` directly. Returning in xmm0 is unchanged since this seems like a reasonable extrapolation of the ABI. With this patch, the calling convention for `i128` and `f128` is the same. GCC passes `__float128` indirectly, which this also matches. However, it also returns indirectly, which is not done here. I intend to attempt a GCC change to also return in `xmm0` rather than making that change here, given the consistency with `i128`. This corresponds to the frontend change in [2], see more details there. [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 [2]: llvm#115052 (cherry picked from commit b6c7942)
…, r=<try> [do not merge] Windows f128 abi experiment Running tests with llvm/llvm-project#115052 and llvm/llvm-project#128848. r? `@ghost` try-job: dist-x86_64-msvc try-job: dist-x86_64-mingw try-job: x86_64-msvc-1 try-job: x86_64-msvc-2 try-job: x86_64-mingw-1 try-job: x86_64-mingw-2 try-job: x86_64-mingw-3
…, r=<try> [do not merge] Windows f128 abi experiment Running tests with llvm/llvm-project#115052 and llvm/llvm-project#128848. r? `@ghost` try-job: dist-x86_64-msvc try-job: dist-x86_64-mingw try-job: x86_64-msvc-1 try-job: x86_64-msvc-2 try-job: x86_64-mingw-1 try-job: x86_64-mingw-2
Add a test based on `win64-i128.c` with the current behavior of `__float128`.
d695313
to
4f07687
Compare
I updated this PR to pass indirectly and make the xmm0 return explicit, identical to This should be ready for a review. Libcall changes are in #128848, that should land at the same time. |
…, r=<try> [do not merge] Windows f128 abi experiment Running tests with llvm/llvm-project#115052 and llvm/llvm-project#128848. r? `@ghost` try-job: dist-x86_64-msvc try-job: dist-x86_64-mingw try-job: x86_64-msvc-1 try-job: x86_64-msvc-2 try-job: x86_64-mingw-1 try-job: x86_64-mingw-2
If you take the two PRs together, we're going to implement the indirection logic at both the frontend and backend level. If we do it in the backend, I think it's technically necessary to do it in the frontend, but it is consistent with how we handle all other types. I'd like to see that explained in the commit message: why should this be in the frontend as well? To answer that question, there are tradeoffs, but it's generally good for memory optimizations to be able to observe ABI details. For example, the backend lowering creates extra stack objects if we need to pass the same fp128 argument twice to two functions, whereas mid-level optimizations might be able to remove that. Otherwise, I think this is good to go. I suppose Rust folks are watching this PR, otherwise, I'd say loop them in. |
4f07687
to
6712444
Compare
Thanks for answering that question, I wouldn't have had a good answer outside of consistency. Does mid-level optimizations refer to optimizations done in Clang rather than in LLVM? I updated the message. Somebody will need to land this for me, the two commits should come separate (first is NFC).
For future reference that's mostly me for f16/f128, I'll update our frontend after this lands. |
Clang currently passes and returns `__float128` in vector registers on MinGW targets, which is LLVM's default ABI for `fp128`. However, the Windows x86-64 calling convention [1] states the following: __m128 types, arrays, and strings are never passed by immediate value. Instead, a pointer is passed to memory allocated by the caller. Structs and unions of size 8, 16, 32, or 64 bits, and __m64 types, are passed as if they were integers of the same size. Structs or unions of other sizes are passed as a pointer to memory allocated by the caller. For these aggregate types passed as a pointer, including __m128, the caller-allocated temporary memory must be 16-byte aligned. Based on the above it sounds like `__float128` should be passed indirectly. Thus, change `f128` passing to use the stack and make the return in xmm0 explicit. This is the identical to `i128`, and passing is the same as GCC. Regarding return values, the documentation states: A scalar return value that can fit into 64 bits, including the __m64 type, is returned through RAX. Non-scalar types including floats, doubles, and vector types such as __m128, __m128i, __m128d are returned in XMM0. This makes it sound like it should be acceptable to return `__float128` in xmm0; however, GCC returns `__float128` on the stack. That above ABI statement as well as consistency with `i128` (which is returned in xmm0) mean that it would likely be better for GCC to change its return ABI to match Clang rather than the other way around, so that portion is left as-is. Clang's MSVC targets do not support `__float128` or `_Float128`, but these changes would also apply there if it is eventually enabled. With [2] which should land around the same time, LLVM will also implement this ABI so it is not technically necessary for Clang to make a change here as well. This is sill done in order to be consistent with other types, and to allow calling convention-aware optimizations at all available optimization layers (rnk mentioned possible reuse of stack arguments). An added benefit is readibility of the LLVM IR since it more accurately reflects what the lowered assembly does. [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 [2]: llvm#128848
6712444
to
db66320
Compare
@rnk (or anyone) would you be able to land this? |
@tgross35 Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
LLVM currently expects `__float128` to be both passed and returned in xmm registers on Windows. However, this disagrees with the Windows x86-64 calling convention [1], which indicates values larger than 64 bits should be passed indirectly. Update LLVM's default Windows calling convention to pass `fp128` directly. Returning in xmm0 is unchanged since this seems like a reasonable extrapolation of the ABI. With this patch, the calling convention for `i128` and `f128` is the same. GCC passes `__float128` indirectly, which this also matches. However, it also returns indirectly, which is not done here. I intend to attempt a GCC change to also return in `xmm0` rather than making that change here, given the consistency with `i128`. This corresponds to the frontend change in [2], see more details there. [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 [2]: llvm#115052
LLVM currently expects `__float128` to be both passed and returned in xmm registers on Windows. However, this disagrees with the Windows x86-64 calling convention [1], which indicates values larger than 64 bits should be passed indirectly. Update LLVM's default Windows calling convention to pass `fp128` directly. Returning in xmm0 is unchanged since this seems like a reasonable extrapolation of the ABI. With this patch, the calling convention for `i128` and `f128` is the same. GCC passes `__float128` indirectly, which this also matches. However, it also returns indirectly, which is not done here. I intend to attempt a GCC change to also return in `xmm0` rather than making that change here, given the consistency with `i128`. This corresponds to the frontend change in [2], see more details there. [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 [2]: #115052
LLVM currently expects `__float128` to be both passed and returned in xmm registers on Windows. However, this disagrees with the Windows x86-64 calling convention [1], which indicates values larger than 64 bits should be passed indirectly. Update LLVM's default Windows calling convention to pass `fp128` directly. Returning in xmm0 is unchanged since this seems like a reasonable extrapolation of the ABI. With this patch, the calling convention for `i128` and `f128` is the same. GCC passes `__float128` indirectly, which this also matches. However, it also returns indirectly, which is not done here. I intend to attempt a GCC change to also return in `xmm0` rather than making that change here, given the consistency with `i128`. This corresponds to the frontend change in [2], see more details there. [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 [2]: llvm/llvm-project#115052
) Clang currently passes and returns `__float128` in vector registers on MinGW targets, which is LLVM's default ABI for `fp128`. However, the Windows x86-64 calling convention [1] states the following: __m128 types, arrays, and strings are never passed by immediate value. Instead, a pointer is passed to memory allocated by the caller. Structs and unions of size 8, 16, 32, or 64 bits, and __m64 types, are passed as if they were integers of the same size. Structs or unions of other sizes are passed as a pointer to memory allocated by the caller. For these aggregate types passed as a pointer, including __m128, the caller-allocated temporary memory must be 16-byte aligned. Based on the above it sounds like `__float128` should be passed indirectly. Thus, change `f128` passing to use the stack and make the return in xmm0 explicit. This is the identical to `i128`, and passing is the same as GCC. Regarding return values, the documentation states: A scalar return value that can fit into 64 bits, including the __m64 type, is returned through RAX. Non-scalar types including floats, doubles, and vector types such as __m128, __m128i, __m128d are returned in XMM0. This makes it sound like it should be acceptable to return `__float128` in xmm0; however, GCC returns `__float128` on the stack. That above ABI statement as well as consistency with `i128` (which is returned in xmm0) mean that it would likely be better for GCC to change its return ABI to match Clang rather than the other way around, so that portion is left as-is. Clang's MSVC targets do not support `__float128` or `_Float128`, but these changes would also apply there if it is eventually enabled. With [2] which should land around the same time, LLVM will also implement this ABI so it is not technically necessary for Clang to make a change here as well. This is sill done in order to be consistent with other types, and to allow calling convention-aware optimizations at all available optimization layers (@rnk mentioned possible reuse of stack arguments). An added benefit is readibility of the LLVM IR since it more accurately reflects what the lowered assembly does. [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 [2]: llvm#128848
LLVM currently expects `__float128` to be both passed and returned in xmm registers on Windows. However, this disagrees with the Windows x86-64 calling convention [1], which indicates values larger than 64 bits should be passed indirectly. Update LLVM's default Windows calling convention to pass `fp128` directly. Returning in xmm0 is unchanged since this seems like a reasonable extrapolation of the ABI. With this patch, the calling convention for `i128` and `f128` is the same. GCC passes `__float128` indirectly, which this also matches. However, it also returns indirectly, which is not done here. I intend to attempt a GCC change to also return in `xmm0` rather than making that change here, given the consistency with `i128`. This corresponds to the frontend change in [2], see more details there. [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170 [2]: llvm#115052
Clang currently passes and returns
__float128
in vector registers onMinGW targets, which is LLVM's default ABI for
fp128
. However, theWindows x86-64 calling convention 1 states the following:
Based on the above it sounds like
__float128
should be passedindirectly. Thus, change
f128
passing to use the stack and make thereturn in xmm0 explicit. This is the identical to
i128
, and passing isthe same as GCC.
Regarding return values, the documentation states:
This makes it sound like it should be acceptable to return
__float128
in xmm0; however, GCC returns
__float128
on the stack. That above ABIstatement as well as consistency with
i128
(which is returned in xmm0)mean that it would likely be better for GCC to change its return ABI to
match Clang rather than the other way around, so that portion is left
as-is.
Clang's MSVC targets do not support
__float128
or_Float128
, butthese changes would also apply there if it is eventually enabled.
With 2 which should land around the same time, LLVM will also
implement this ABI so it is not technically necessary for Clang to make
a change here as well. This is sill done in order to be consistent with
other types, and to allow calling convention-aware optimizations at all
available optimization layers (@rnk mentioned possible reuse of stack
arguments). An added benefit is readibility of the LLVM IR since it more
accurately reflects what the lowered assembly does.