Skip to content

Clang adds "noundef" annotation to char arguments #56551

Open
@RalfJung

Description

@RalfJung

I think the way clang translates the following C code to LLVM IR is incorrect:

char id(char c) {
    return c;
}

void my_memcpy(char *src, char *dst, int n) {
    for (int i = 0; i < n; i++) {
        dst[i] = id(src[i]);
    }
}

The resulting IR defines id as @id(i8 noundef signext %0). The noundef is what I am concerned by. This translation means calling my_memcpy as follows leads to UB, since some of the bytes being copied here are undef or poison (namely, they are padding):

struct S {
    uint8_t f1;
    uint16_t f2;
};

void testcase() {
    struct S s, s_copy;
    s.f1 = 0;
    s.f2 = 0;
    my_memcpy((char*)&s, (char*)&s_copy, sizeof(struct S));
}

If I understand the C standard correctly, this program (running testcase) is entirely well-defined. In particular, C17 6.2.6.1 §5 says (emphasis mine)

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined.

But we are using a character type here. The standard also explicitly says in 6.2.6.2 that char types have no padding bits. So I don't think there is any room here for UB to arise when copying arbitrary data (including uninitialized memory) at char type. Therefore clang should not add noundef to character type variables.

Furthermore, I am not entirely sure what the status of this proposal is, but if it has been accepted, then I am not sure that adding noundef to any other integer type is correct, either. That proposal states explicitly

None of the integral types have extraordinary values.

And at least for C++, https://eel.is/c++draft/basic.fundamental#4 has a note on padding in integer types stating

Padding bits have unspecified value, but cannot cause traps.

So, at least for C++, I cannot see a justification for why clang adds noundef to all integer types. For non-character integer types in C, the standard is not clear enough for me to be sure either way.

Cc @aqjune @nunoplopes

Metadata

Metadata

Assignees

No one assigned

    Labels

    clang:codegenIR generation bugs: mangling, exceptions, etc.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions