Description
I think the way clang translates the following C code to LLVM IR is incorrect:
char id(char c) {
return c;
}
void my_memcpy(char *src, char *dst, int n) {
for (int i = 0; i < n; i++) {
dst[i] = id(src[i]);
}
}
The resulting IR defines id
as @id(i8 noundef signext %0)
. The noundef
is what I am concerned by. This translation means calling my_memcpy as follows leads to UB, since some of the bytes being copied here are undef or poison (namely, they are padding):
struct S {
uint8_t f1;
uint16_t f2;
};
void testcase() {
struct S s, s_copy;
s.f1 = 0;
s.f2 = 0;
my_memcpy((char*)&s, (char*)&s_copy, sizeof(struct S));
}
If I understand the C standard correctly, this program (running testcase
) is entirely well-defined. In particular, C17 6.2.6.1 §5 says (emphasis mine)
Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined.
But we are using a character type here. The standard also explicitly says in 6.2.6.2 that char
types have no padding bits. So I don't think there is any room here for UB to arise when copying arbitrary data (including uninitialized memory) at char
type. Therefore clang should not add noundef
to character type variables.
Furthermore, I am not entirely sure what the status of this proposal is, but if it has been accepted, then I am not sure that adding noundef
to any other integer type is correct, either. That proposal states explicitly
None of the integral types have extraordinary values.
And at least for C++, https://eel.is/c++draft/basic.fundamental#4 has a note on padding in integer types stating
Padding bits have unspecified value, but cannot cause traps.
So, at least for C++, I cannot see a justification for why clang adds noundef
to all integer types. For non-character integer types in C, the standard is not clear enough for me to be sure either way.