-
Notifications
You must be signed in to change notification settings - Fork 61
Description
Link: http://robbertkrebbers.nl/thesis.html
This PhD thesis attempts to formalize the sequential (single-threaded) part of the C11 standard. The core contribution is a memory model that can explain the behavior of mixing "low-level" (byte-wise) and "high-level" C -- which kinds of pointer arithmetic and pointer casts are allowed, what exactly happens with pointers to field of unions as the union is overwritten, and so on. Type-based alias analysis is formalized and proven correct. A few questions remain unanswered, mostly related to pointer-to-integer casts (also see #30).
Some examples discussed in the thesis include:
union int_or_short { int x; short y; } u = { .y = 3 };
int *p = &u.x;
// p points to the x variant of u
short *q = &u.y; // q points to the y variant of u
short z = *q;
// u has variant y and is accessed through y -> OK
*p = 10;
// u has variant y and is accessed through x -> bad
union int_or_short { int x; short y; } u = { .x = 3 };
printf("%d\n", u.y); // this is legal, because of type punning
union int_or_short { int x; short y; } u = { .x = 3 };
short *p = &u.y;
printf("%d\n", *p); // this is bad, type punning does not apply
Representation of pointers
Crucially, pointers in this model are not numbers. They are paths in a tree that define how to traverse memory, from the root of a block, through the fields of structs and unions and the indices in an array to the destination of the pointer.
See, for example, the image on page 72: The pointer to s.u.x[1]
is the path that starts at the root, picks the u
field of the struct, then the x
field of the union, then the index 1 in the array. This way, the actual layout of structs and unions can remain uncertain, and pointer arithmetic that goes beyond array bounds becomes UB.
Relevance for Rust
Hopefully, we won't have type-based alias analysis or type punning in Rust. Still, some parts of the model may remain relevant: We have to decide how much pointer arithmetic we allow. The exact layout of structs and enums is left to the compiler (and we want to have optimizations like representing Option<Box>
without a discriminant), so we probably want pointer arithmetic that makes assumptions about this layout to be UB. The tree-based addressing could be used to appropriately model this underspecification.
(Sorry my summary is not as good as the ones Niko wrote... )