Description
Hi, using this sample code:
#include <cfenv>
#include <cstdio>
struct bc_t
{
float minDist, maxDist;
float hudSize;
float a1, a2, a3 = 0.0f;
float minSize;
float b1, b2, b3 = 0.0f;
float maxSize;
float c1, c2, c3 = 0.0f;
};
int main(int, char**)
{
feenableexcept(FE_DIVBYZERO);
bc_t bc;
sscanf("1 1 1", "%f %f %f", &bc.hudSize, &bc.minSize, &bc.maxSize);
bc.minDist = bc.hudSize / bc.maxSize;
bc.maxDist = bc.hudSize / bc.minSize;
printf("%f\n", bc.minDist);
return 0;
}
By compiling it with -O2
, the compiled code raises some division by zero
exception when executed:
Process 3247973 stopped
* thread #1, name = 'a.out', stop reason = signal SIGFPE: floating point divide by zero
frame #0: 0x00005555555551c5 a.out`main + 101
a.out`main:
-> 0x5555555551c5 <+101>: divps %xmm2, %xmm0
0x5555555551c8 <+104>: movlps %xmm0, (%rsp)
0x5555555551cc <+108>: cvtss2sd %xmm0, %xmm0
0x5555555551d0 <+112>: leaq 0xe3c(%rip), %rdi
Process 3249378 stopped
* thread #1, name = 'a.out', stop reason = signal SIGFPE: floating point divide by zero
frame #0: 0x00005555555551c5 a.out`main((null)=<unavailable>, (null)=<unavailable>) at a.cpp:20:26
17 feenableexcept(FE_DIVBYZERO);
18 bc_t bc;
19 sscanf("1 1 1", "%f %f %f", &bc.hudSize, &bc.maxSize, &bc.minSize);
-> 20 bc.minDist = bc.hudSize / bc.maxSize;
21 bc.maxDist = bc.hudSize / bc.minSize;
22 printf("%f\n", bc.minDist);
23 return 0;
My guess is that the SSE optimized code raises division by zero exceptions from unused fields of the xmm registers, despite we discard those results.
The requirement of -O2
requiring either -ftrapping-math
or -ffast-math
is curious since -ffast-math
sets -fno-trapping-math
.
Also, I thought -ftrapping-math
was the default, and that -fno-trapping-math
is part of -ffast-math
, but just using -O2
behaves like if -fno-trapping-math
is used, so it behaves like if part of -ffast-math
was enabled anyway… But maybe -ffast-math
just modifies other behaviors that make -ftrapping-math
or -fno-trapping-math
irrelevant.
Using the Godbolt's compiler explorer (here Clang 19.1.0), it only works with -O2 -ffast-math
(or depreacted -Ofast
):
compiler flags | status |
---|---|
✅️ | |
-O0 |
✅️ |
-Os |
❌️ |
-Os -fno-trapping-math |
❌️ |
-Os -ftrapping-math |
✅️ |
-O1 |
✅️ |
-O2 |
❌️ |
-O2 -fno-trapping-math |
❌️ |
-O2 -ftrapping-math |
✅️ |
-ffast-math |
✅️ |
-O2 -ffast-math |
✅️ |
-O2 -ffast-math -ftrapping-math |
✅️ |
-O2 -ffast-math -fno-trapping-math |
✅️ |
-Ofast |
✅️ |
See: https://godbolt.org/z/395c8nMef
And (with -ftrapping-math
added): https://godbolt.org/z/czbGTjs6E
And (with -fno-trapping-math
added): https://godbolt.org/z/zYr5Pba3W
With 32-bit i686 I reproduce the bug when using SSE but get no bug when not using SSE.
compiler flags | -m32 -msse |
-m32 -mno-sse |
---|---|---|
✅️ | ✅️ | |
-Os |
❌️ | ✅️ |
-O1 |
✅️ | ✅️ |
-O2 |
❌️ | ✅️ |
-ffast-math |
✅️ | ✅️ |
-O2 -ffast-math |
✅️ | ✅️ |
See 32-bit i686 with SSE I get the same failure: https://godbolt.org/z/199eqhzGh
And 32-bit i686 build without SSE I get no failure: https://godbolt.org/z/sfd9TsTj4
On my end on Ubuntu 24.04 with amd64, I get same results with clang 19.1.5.
On Ubuntu 24.04 with amd64 and different versions of the clang compiler, I only get it working with clang 13 and 14, every later version breaks it:
-O0 |
-O1 |
-O2 |
-O2 -ftrapping-math |
-ffast-math |
-O2 -ffast-math |
|
---|---|---|---|---|---|---|
clang 13.0.1 | ✅️ | ✅️ | ✅️ | ✅️ | ✅️ | ✅️ |
clang 14.0.6 | ✅️ | ✅️ | ✅️ | ✅️ | ✅️ | ✅️ |
clang 15.0.7 | ✅️ | ✅️ | ❌️ | ✅️ | ✅️ | ✅️ |
clang 16.0.6 | ✅️ | ✅️ | ❌️ | ✅️ | ✅️ | ✅️ |
clang 17.0.6 | ✅️ | ✅️ | ❌️ | ✅️ | ✅️ | ✅️ |
clang 18.1.3 | ✅️ | ✅️ | ❌️ | ✅️ | ✅️ | ✅️ |
clang 19.1.5 | ✅️ | ✅️ | ❌️ | ✅️ | ✅️ | ✅️ |
On GCC 14.02 I get none of those issues (no one false positive division by zero
error is raised whatever the compiler flags being used: https://godbolt.org/z/rxMWM68Mc
Note: Disabling SSE on amd64 just produces garbage computation (1.0/1.0
gives 0.0
), but I don't know if that makes sense to disable SSE on amd64. GCC produces the same garbage (1.0/1.0
giving 0.0
). Though I'm surprised to not get a warning if that's not legit to do, also I'm surprised the generated code runs if that's not legit to do. See: https://godbolt.org/z/77PYGTc5b (Clang) and: https://godbolt.org/z/cvW4Er3Kd (GCC).