Skip to content

Missed optimization: inadequate float range propagation #70985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bbarenblat opened this issue Nov 1, 2023 · 8 comments
Open

Missed optimization: inadequate float range propagation #70985

bbarenblat opened this issue Nov 1, 2023 · 8 comments

Comments

@bbarenblat
Copy link
Contributor

At 604eff6, clang++ -target x86_64-linux-gnu -O3 compiles

// Copyright 2023 Google LLC
// SPDX-License-Identifier: Apache-2.0
int ga, gb;
void F(double x) {
  if (x > 5) ++ga;
  if (x > 10) ++gb;
}

to

.LCPI0_0:
        .quad   0x4014000000000000              # double 5
.LCPI0_1:
        .quad   0x4024000000000000              # double 10
_Z1Fd:  ucomisd xmm0, qword ptr [rip + .LCPI0_0]
        ja      .LBB0_1
        ucomisd xmm0, qword ptr [rip + .LCPI0_1]  # ***
        ja      .LBB0_3                           # ***
.LBB0_4:
        ret
.LBB0_1:
        inc     dword ptr [rip + ga]
        ucomisd xmm0, qword ptr [rip + .LCPI0_1]
        jbe     .LBB0_4
.LBB0_3:
        inc     dword ptr [rip + gb]
        ret
ga:     .long   0
gb:     .long   0

The conditional branch marked with # *** will never be taken – in the domain of doubles, !(x > 5) always implies !(x > 10). Telling the compiler this explicitly –

// Copyright 2023 Google LLC
// SPDX-License-Identifier: Apache-2.0
int ga, gb;
void F(double x) {
  if (x > 5) {
    ++ga;
    if (x > 10) ++gb;
  }
}

– produces

.LCPI0_0:
        .quad   0x4014000000000000              # double 5
.LCPI0_1:
        .quad   0x4024000000000000              # double 10
_Z1Fd:  ucomisd xmm0, qword ptr [rip + .LCPI0_0]
        jbe     .LBB0_3
        inc     dword ptr [rip + ga]
        ucomisd xmm0, qword ptr [rip + .LCPI0_1]
        jbe     .LBB0_3
        inc     dword ptr [rip + gb]
.LBB0_3:
        ret
ga:     .long   0
gb:     .long   0

The tautological comparison gets emitted at -O1 through -O3 and -Og, though not at -Os or -Oz. -ffast-math doesn’t change the behavior.

GCC tip-of-tree emits assembly close to the second case in both cases.

Compiler Explorer for experimentation.

@arsenm
Copy link
Contributor

arsenm commented Nov 2, 2023

I believe ConstraintElimination could be taught about floating point compares cc @fhahn

@alexander-shaposhnikov
Copy link
Collaborator

ConstraintSystem (see e.g. https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/ConstraintSystem.h#L57) was designed to work with integers, perhaps, it's nontrivial to make it work with floating point types. Not sure, it looks like for simple cases it's not hard to teach llvm::isImpliedByDomCondition & InstructionSimplify to handle them. cc: @nikic

@nikic
Copy link
Contributor

nikic commented Nov 2, 2023

I believe the proper place to handle this case would be in LVI/CVP. However, this would involve a substantial amount of effort, the first part of which would be to implement ConstantRangeFP, which can then be supported in ValueLattice and from there supported in LVI and SCCP.

Whether supporting this is actually worthwhile is a different question. I'd appreciate some real-world motivating examples that go beyond "GCC does this".

@dtcxzyw
Copy link
Member

dtcxzyw commented Nov 2, 2023

Related issue: #68301

I'd appreciate some real-world motivating examples that go beyond "GCC does this".

+1 :)

@fhahn
Copy link
Contributor

fhahn commented Nov 2, 2023

I believe ConstraintElimination could be taught about floating point compares cc @fhahn

For cases where it is safe to transfer to the integer domain it should be, but I am not sure what exactly the transfer conditions would have to be and how useful it would be in practice

@bbarenblat
Copy link
Contributor Author

I'd appreciate some real-world motivating examples that go beyond "GCC does this".

Oh, absolutely. I wasn’t trying to say “GCC does this so Clang should do it too”. This is definitely low-priority; I mostly wanted to bring it up in case it indicated a bug. :) It’s good to know that it’s not a bug and that it simply hasn’t been implemented. Thank you for looking into this!

As a motivating example, we do see code like this inside Google. The global variables are usually counters tracking how many requests have taken longer than certain time thresholds, and the floating-point value is usually the number of seconds a request took. (In an ideal world, nobody at Google would be tracking time using doubles, but a Google-sized codebase is far from ideal.)

That said, though, I am fully in favor of designating this a wish-list item. I discovered it during the course of debugging a different issue, thought, “Hmm, that’s weird,” and filed a bug. I’ve never seen this show up on a profile, nor do I expect to (the extraneous branch will never be taken, so it will be predicted very well). And I don’t think the extra ten bytes of binary is going to make a meaningful difference to anybody’s RAM requirements.

@jcranmer-intel
Copy link
Contributor

Whether supporting this is actually worthwhile is a different question. I'd appreciate some real-world motivating examples that go beyond "GCC does this".

A non-contrived example I can think of is eliminating range reduction checks (or domain checks in general) from math functions. Although I am dubious that it would provide much benefit, especially since such checks are likely to be extremely predictable.

@arsenm
Copy link
Contributor

arsenm commented Nov 3, 2023

A non-contrived example I can think of is eliminating range reduction checks (or domain checks in general) from math functions. Although I am dubious that it would provide much benefit, especially since such checks are likely to be extremely predictable.

What about GPUs where there is no prediction? I would like to eliminate range checks from the builtin math library functions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants