-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[clang] -ffast-math in 19.1.0 prevents function from returning intended __m128 bitmask #118152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1, we were also affected by this. https://godbolt.org/z/7v7hb8E9q is a minimized example of what happened. |
CC @andykaylor |
If it has a floating-point type, then yes this is not allowed to contain a nan or inf under finite math only. I don't know how __m128 is defined or intended to be used, but if this is wrong it should not be considered a floating point type. |
It should not be considered a floating-point type but rather a multi-purpose type that can contain floating point values or bit masks. The most common 32-bit bit patterns used with the bitwise instructions are 0, 0xFFFFFFFF, and 0x80000000, with the latter being used to twiddle the sign bit. It occurs to me with that "fast math", all 3 of these values are potentially unreliable because of |
Would like to point out that The most common technique affected by this change, is using x86 - Fastest way to compute absolute value using SSE - Stack Overflow This will no longer work as of clang 19. The workaround is to use
As I wrote in the original description, there are intrinsics like comparison that emit NaN The current symptom can be found in implementations like what I and slipher had shown above, using constant non-finite values as input to the intrinsics. However, the question can be extended to how future optimization may handle NaNs produced by SSE intrinsics. If such output may also be affected, it may be wise to advise against use of select SSE intrinsics in combination with A crude example: given two https://godbolt.org/z/1eqMWWWKE As the inputs to Regards, |
The header file defines __m128 as a vector of floats, and clang chooses to lower intrinsics to target-independent IR as often as possible for optimization purposes, so applying the no-nans and no-infinities rules is a natural consequence. However, there is some question as to whether people using intrinsics actually want the fast-math flags to apply to their intrinsics. My experience is that some do and some don't. In clang, we do apply fast-math to the intriniscs. There are some problems with that. If you don't want fast-math flags applied to the intrinsics, you could do something like this:
There are some other problems with that. I have expressed recently that as our optimization based on the no-nan and no-infinities options becomes more aggressive, these options are going to be less useful to a broader group of people. For that reason, I recently changed the |
|
Yes, that's true, but of course If you want the raw bitwise functionality of |
Apologies for my oversight regarding signed zero and its handling in So the use of Therefore, for the currently available clang 19 (and for future versions if the change is to stay), in order to ensure handling of an arbitrary bitmask, I guess we should be using |
There's a few separate but not entirely orthogonal issues going on here.
The change that really broke this code is that now we have |
I think the way to go is to stop treating the type __m128 as a floating-point type eligible for fast attributes / flags. IIRC I originally wanted to check the original source type is FP, rather than the IR type, but don't remember where that ended up. |
The problem with that is that it would also block the FMA formation in the case that Joshua cited in his second bullet. It seems to me that the problem is with the intrinsics like _mm_and_ps that want to treat the values as integers even though they accept arguments that say the values are floating-point. I suspect that there are integer equivalents that could be used with an intermediate cast intrinsic in all such cases. Could we deprecate the "bitwise floating-point" intrinsics? Maybe change the header definition to a macro that invokes the equivalent cast and integer intrinsics? CC: @phoebewang |
I would believe deprecating it outright is bit too extreme, since the intrinsics function correctly without
Unfortunately, although float bitwise intrinsics were defined in the first SSE, x86 intrinsics list | Microsoft Learn So straight replacement of float intrinsics with integer equivalents will result in error when option |
It's ugly, but I suppose this only applies for nofpclass, and not the fast math flags. |
How about something in between? #118603 |
Turns float_control doesn't work for it: https://godbolt.org/z/7bEEMYvxW |
The problem I noticed is that using A potential fix for that is to make |
Thanks for the point! I think it's a big hammer to introduce an attribute like that. And I don't think it's what we want. If I think the proper way to fix this is to make |
I would agree with this, but it doesn't seem to be limited to nofpclass. The float_control pragma misses some other attributes as well ("approx-func-fp-math" for example). |
Hello.
A function like below may return an unexpected
__m128
value when compiled using clang 19.1.0 with-ffast-math -O2
.Sample code at Compiler Explorer https://godbolt.org/z/xG5r1n7xY , demonstrating different result from lambda return and immediate assignment.
Workaround:
-ffast-math
or-O2
-fno-finite-math-only
I can speculate that this is because the i32 values set in the sample function are NaN when interpreted as float, and
-ffast-math
optimizer discarded such return value as invalid. However, the__m128
value was intended for use in bitwise operation intrinsics like _mm_and_ps(), and not as a value in floating point calculation.This was permitted in clang 18, but no longer as of clang 19.1.0. Was this behavior change intended?
If so, is guaranteeing assignment and returning of an arbitrary
__m128
value, which may contain NaN or INFINITY, no longer possible with-ffast-math
enabled for clang 19 onward? In addition to bitwise intrinsics, comparison intrinsics like _mm_cmpgt_ps() will use0xFFFFFFFF
as truth values, so handling non-finite value are necessary for multiple intrinsics by definition.Or, is it simply not recommended to use these intrinsics with
-ffast-math
, or more specifically-ffinite-math-only
? Using-ffast-math -fno-finite-math-only
combo is a valid workaround, but will affect the entire compilation, potentially preventing other uninvolved code from taking benefit of finite-math-only-optimization.Regards,
The text was updated successfully, but these errors were encountered: