Skip to content

Inefficient code generation for u8 -> boolean conversion if bounds check already applied #121673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
orlp opened this issue Feb 27, 2024 · 3 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@orlp
Copy link
Contributor

orlp commented Feb 27, 2024

This occurs both on the latest stable and nightly with both -C opt-level=2 and -C opt-level=3, and looking back some versions this has been going on for quite a while, so this is not a regression AFAIK. Consider the following piece of code:

#[no_mangle]
pub fn test(x: &u8) -> bool {
    let load = *x;
    if load <= 1 {
        return load == 1;
    }

    // Just some nonsense to keep the above branch.
    something_dynamic()
}

#[inline(never)]
#[cold]
fn something_dynamic() -> bool {
    std::env::var("dynamic").is_ok()
}

I would expect that in a release build, the above code loads the u8, checks if it is a legal value for a bool, and if so return it, otherwise going to something_dynamic(). The motivation for this piece of code is something similar to a OnceLock<bool> in an incredibly hot piece of code. Ideally the hot path only consists of a single mov, cmp and jmp. Compiled we see the following:

test:
        movzx   eax, byte ptr [rdi]
        cmp     al, 2
        jae     example::something_dynamic
        cmp     al, 1
        sete    al
        ret

example::something_dynamic:
        <omitted>

It appears that Rust always emits code for turning an u8 into a bool (cmp al, 1 followed by sete al), regardless of whether this is necessary. In this case, since we checked that the u8 is <= 1, this is not necessary at all. The same problem occurs on ARM:

_test:
        ldrb    w8, [x0]
        cmp     w8, #2
        b.hs    LBB0_2
        cmp     w8, #1
        cset    w0, eq
        ret
       
LBB0_2:
        b       example::something_dynamic

The problem even persists if we try to use transmute, or pointer casts to bypass the problem. All the following variants still cause a superfluous value test:

    if load <= 1 {
        return load == 1;
        // return unsafe { *(&load as *const u8 as *const bool) };
        // return unsafe { std::mem::transmute(load) };
        // return unsafe { std::mem::transmute_copy(&load) };
    }

The weirdest thing is that this pessimization only occurs when there is a bounds check already applied. The following function avoids a test, directly loading the byte:

#[no_mangle]
unsafe fn maybe_load(x: &u8, should_load: bool) -> bool {
    if should_load {
        return unsafe { std::mem::transmute_copy(x) };
    }

    something_dynamic()
}
maybe_load:
        test    esi, esi
        je      example::something_dynamic
        movzx   eax, byte ptr [rdi]
        ret

However, if you try to implement test in terms of maybe_load, it gets inlined and the useless cmp, sete is back.

@orlp orlp added the C-bug Category: This is a bug. label Feb 27, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 27, 2024
@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such labels Feb 27, 2024
@orlp
Copy link
Contributor Author

orlp commented Feb 27, 2024

I don't know if the Rust team can do something about this, perhaps it should be a LLVM bug. The following C++ program:

#include <cstdint>
#include <cstdlib>
#include <cstring>

__attribute__((noinline))
__attribute__((cold))
bool dynamic() {
    return std::getenv("foo") != 0;
}

bool test(uint8_t* x) {
    uint8_t load = *x;
    if (load <= 1) {
        bool ret;
        std::memcpy(&ret, &load, 1);
        return ret;
    }
    return dynamic();
}

compiles to the following with GCC 13

test(unsigned char*):
        movzx   eax, BYTE PTR [rdi]
        cmp     al, 1
        ja      .L4
        ret

but with Clang 17 we once again get

test(unsigned char*):                              # @test(unsigned char*)
        movzx   eax, byte ptr [rdi]
        cmp     al, 1
        ja      dynamic()                     # TAILCALL
        test    al, al
        setne   al
        ret

Perhaps someone could forward it to the right place?

@dianqk
Copy link
Member

dianqk commented Feb 28, 2024

Hmm, this probably needs to be handled in the LLVM backend.

@rustbot label +A-LLVM

@rustbot rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Feb 28, 2024
YanWQ-monad added a commit to YanWQ-monad/llvm-project that referenced this issue Mar 4, 2024
@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 5, 2024
@dianqk
Copy link
Member

dianqk commented Mar 9, 2024

Upstream issue: llvm/llvm-project#84605

YanWQ-monad added a commit to YanWQ-monad/llvm-project that referenced this issue Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants