Skip to content

[clang] -O2 requires -ftrapping-math or -ffast-math on amd64 to avoid false positives with feenableexcept(FE_DIVBYZERO) #118265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
illwieckz opened this issue Dec 2, 2024 · 10 comments
Labels
floating-point Floating-point math llvm:optimizations question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@illwieckz
Copy link
Contributor

illwieckz commented Dec 2, 2024

Hi, using this sample code:

#include <cfenv>
#include <cstdio>

struct bc_t
{
	float minDist, maxDist;
	float hudSize;
	float a1, a2, a3 = 0.0f;
	float minSize;
	float b1, b2, b3 = 0.0f;
	float maxSize;
	float c1, c2, c3 = 0.0f;
};

int main(int, char**)
{
	feenableexcept(FE_DIVBYZERO);
	bc_t bc;
	sscanf("1 1 1", "%f %f %f", &bc.hudSize, &bc.minSize, &bc.maxSize);
	bc.minDist = bc.hudSize / bc.maxSize;
	bc.maxDist = bc.hudSize / bc.minSize;
	printf("%f\n", bc.minDist);
	return 0;
}

By compiling it with -O2, the compiled code raises some division by zero exception when executed:

Process 3247973 stopped
* thread #1, name = 'a.out', stop reason = signal SIGFPE: floating point divide by zero
    frame #0: 0x00005555555551c5 a.out`main + 101
a.out`main:
->  0x5555555551c5 <+101>: divps  %xmm2, %xmm0
    0x5555555551c8 <+104>: movlps %xmm0, (%rsp)
    0x5555555551cc <+108>: cvtss2sd %xmm0, %xmm0
    0x5555555551d0 <+112>: leaq   0xe3c(%rip), %rdi

Process 3249378 stopped
* thread #1, name = 'a.out', stop reason = signal SIGFPE: floating point divide by zero
    frame #0: 0x00005555555551c5 a.out`main((null)=<unavailable>, (null)=<unavailable>) at a.cpp:20:26
   17  		feenableexcept(FE_DIVBYZERO);
   18  		bc_t bc;
   19  		sscanf("1 1 1", "%f %f %f", &bc.hudSize, &bc.maxSize, &bc.minSize);
-> 20  		bc.minDist = bc.hudSize / bc.maxSize;
   21  		bc.maxDist = bc.hudSize / bc.minSize;
   22  		printf("%f\n", bc.minDist);
   23  		return 0;

My guess is that the SSE optimized code raises division by zero exceptions from unused fields of the xmm registers, despite we discard those results.

The requirement of -O2 requiring either -ftrapping-math or -ffast-math is curious since -ffast-math sets -fno-trapping-math.

Also, I thought -ftrapping-math was the default, and that -fno-trapping-math is part of -ffast-math, but just using -O2 behaves like if -fno-trapping-math is used, so it behaves like if part of -ffast-math was enabled anyway… But maybe -ffast-math just modifies other behaviors that make -ftrapping-math or -fno-trapping-math irrelevant.

Using the Godbolt's compiler explorer (here Clang 19.1.0), it only works with -O2 -ffast-math (or depreacted -Ofast):

compiler flags status
✅️
-O0 ✅️
-Os ❌️
-Os -fno-trapping-math ❌️
-Os -ftrapping-math ✅️
-O1 ✅️
-O2 ❌️
-O2 -fno-trapping-math ❌️
-O2 -ftrapping-math ✅️
-ffast-math ✅️
-O2 -ffast-math ✅️
-O2 -ffast-math -ftrapping-math ✅️
-O2 -ffast-math -fno-trapping-math ✅️
-Ofast ✅️

See: https://godbolt.org/z/395c8nMef
And (with -ftrapping-math added): https://godbolt.org/z/czbGTjs6E
And (with -fno-trapping-math added): https://godbolt.org/z/zYr5Pba3W

With 32-bit i686 I reproduce the bug when using SSE but get no bug when not using SSE.

compiler flags -m32 -msse -m32 -mno-sse
✅️ ✅️
-Os ❌️ ✅️
-O1 ✅️ ✅️
-O2 ❌️ ✅️
-ffast-math ✅️ ✅️
-O2 -ffast-math ✅️ ✅️

See 32-bit i686 with SSE I get the same failure: https://godbolt.org/z/199eqhzGh
And 32-bit i686 build without SSE I get no failure: https://godbolt.org/z/sfd9TsTj4

On my end on Ubuntu 24.04 with amd64, I get same results with clang 19.1.5.

On Ubuntu 24.04 with amd64 and different versions of the clang compiler, I only get it working with clang 13 and 14, every later version breaks it:

-O0 -O1 -O2 -O2 -ftrapping-math -ffast-math -O2 -ffast-math
clang 13.0.1 ✅️ ✅️ ✅️ ✅️ ✅️ ✅️
clang 14.0.6 ✅️ ✅️ ✅️ ✅️ ✅️ ✅️
clang 15.0.7 ✅️ ✅️ ❌️ ✅️ ✅️ ✅️
clang 16.0.6 ✅️ ✅️ ❌️ ✅️ ✅️ ✅️
clang 17.0.6 ✅️ ✅️ ❌️ ✅️ ✅️ ✅️
clang 18.1.3 ✅️ ✅️ ❌️ ✅️ ✅️ ✅️
clang 19.1.5 ✅️ ✅️ ❌️ ✅️ ✅️ ✅️

On GCC 14.02 I get none of those issues (no one false positive division by zero error is raised whatever the compiler flags being used: https://godbolt.org/z/rxMWM68Mc


Note: Disabling SSE on amd64 just produces garbage computation (1.0/1.0 gives 0.0), but I don't know if that makes sense to disable SSE on amd64. GCC produces the same garbage (1.0/1.0 giving 0.0). Though I'm surprised to not get a warning if that's not legit to do, also I'm surprised the generated code runs if that's not legit to do. See: https://godbolt.org/z/77PYGTc5b (Clang) and: https://godbolt.org/z/cvW4Er3Kd (GCC).

@github-actions github-actions bot added the clang Clang issues not falling into any other category label Dec 2, 2024
@illwieckz
Copy link
Contributor Author

@andykaylor here:

you said:

In clang, we do apply fast-math to the intriniscs. There are some problems with that. […]

Does clang also applies some implicit optimizations to generated intrinsics as well (like -fno-trapping-math), optimizations that would not be used when not generating intrinsics?

By generated intrinsics I mean clang deciding to use divss on some xmm register when doing a single division on standalone floats like the user writing a = b / c; without the user using any explicit function from immintrin.h.

@illwieckz
Copy link
Contributor Author

This is something I found when tracking down #118152 on my end by fiddling with compiler flags:

I was hoping that using feenableexcept() would help me catch some float computation errors that Clang 19 produces but Clang 18 doesn't, in hope to pinpoint some sample of code that reproduce #118152.

But I found this instead.

@illwieckz
Copy link
Contributor Author

So @slipher pointed out to me the usage of -fno-trapping-math is the default, and indeed I found this:

-f[no-]trapping-math

Control floating point exception behavior. -fno-trapping-math allows optimizations that assume that floating point operations cannot generate traps such as divide-by-zero, overflow and underflow.

  • The option -ftrapping-math behaves identically to -ffp-exception-behavior=strict.
  • The option -fno-trapping-math behaves identically to -ffp-exception-behavior=ignore. This is the default.

The quote is from: https://clang.llvm.org/docs/UsersManual.html

It means the default behavior makes feenableexcept() unusable on architectures featuring SSE.

But since it's a glibc function, maybe clang cannot do anything about it…

@EugeneZelenko EugeneZelenko added llvm:optimizations floating-point Floating-point math and removed clang Clang issues not falling into any other category labels Dec 2, 2024
@arsenm
Copy link
Contributor

arsenm commented Dec 2, 2024

This code doesn't have FENV_ACCESS on for most of the flags you demonstrated, meaning feenableexcept works or exceptions are observed. We REALLY need to have a warning if any of these functions are used without it enabled

@andykaylor
Copy link
Contributor

@andykaylor here:

you said:

In clang, we do apply fast-math to the intriniscs. There are some problems with that. […]

Does clang also applies some implicit optimizations to generated intrinsics as well (like -fno-trapping-math), optimizations that would not be used when not generating intrinsics?

By generated intrinsics I mean clang deciding to use divss on some xmm register when doing a single division on standalone floats like the user writing a = b / c; without the user using any explicit function from immintrin.h.

In that comment I was referring to the target-specific intrinsics declared in immintrin.h and the related headers that it includes. The use of divss is something else. That's just an implementation detail determined by the backend.

The problem you're running into here is that by default clang does not allow access to the floating-point environment. Unless you have doene something to enable access to the floating-point environment (such as #pragma FENV_ACCESS ON or compiled with -ffp-model=strict or -ftrapping-math) the optimizer will assume the default floating-point environment, meaning floating-point exceptions are unmasked and the default rounding mode is used.

This isn't really specific to SSE. The same assumptions are made for any target, but it may be exposed in different ways on different targets.

@jcranmer-intel
Copy link
Contributor

It means the default behavior makes feenableexcept() unusable on architectures featuring SSE.

But since it's a glibc function, maybe clang cannot do anything about it…

You need to use #pragma STDC FENV_ACCESS ON (or equivalent command-line flags) to use the function. Even in standard C, being in a non-default floating-point environment without setting that pragma is undefined behavior, and all compilers that I'm aware of retain that sense of you need to opt in to observing the floating-point environment.

@illwieckz
Copy link
Contributor Author

This code doesn't have FENV_ACCESS on for most of the flags you demonstrated, meaning feenableexcept works or exceptions are observed. We REALLY need to have a warning if any of these functions are used without it enabled

Oh! That's good to know! I have seen this pragma used in some macOS example but most Linux/BSD documentation seem to not mention it (example, example, example).

A warning would help a lot indeed.

I guess I got all the answers I needed.

@jcranmer-intel
Copy link
Contributor

gcc doesn't support the FENV_ACCESS pragma at all (requiring instead command-line flags to achieve the same effect), so that may bias some of the OS documentation away from mentioning it.

@illwieckz
Copy link
Contributor Author

gcc doesn't support the FENV_ACCESS pragma at all (requiring instead command-line flags to achieve the same effect), so that may bias some of the OS documentation away from mentioning it.

Oh, that explains why documentations I found mentioning this pragma were about macOS. So the command-line flags that achieve the same effect would be -ffp-model=strict or -ftrapping-math?

@jcranmer-intel
Copy link
Contributor

There's a few other flags that also have the practical effect of #pragma STDC FENV_ACCESS ON-- -frounding-math does so (though I wouldn't count on it continuing to do so, once we eventually get static rounding mode intrinsics), and -ffp-exception-behavior=maytrap|strict similarly does so.

@EugeneZelenko EugeneZelenko added the question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
floating-point Floating-point math llvm:optimizations question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

No branches or pull requests

5 participants