Skip to content

The C standard formalized in Coq #142

@RalfJung

Description

@RalfJung

Link: http://robbertkrebbers.nl/thesis.html

This PhD thesis attempts to formalize the sequential (single-threaded) part of the C11 standard. The core contribution is a memory model that can explain the behavior of mixing "low-level" (byte-wise) and "high-level" C -- which kinds of pointer arithmetic and pointer casts are allowed, what exactly happens with pointers to field of unions as the union is overwritten, and so on. Type-based alias analysis is formalized and proven correct. A few questions remain unanswered, mostly related to pointer-to-integer casts (also see #30).

Some examples discussed in the thesis include:

union int_or_short { int x; short y; } u = { .y = 3 };
int *p = &u.x;
// p points to the x variant of u
short *q = &u.y; // q points to the y variant of u

short z = *q;
// u has variant y and is accessed through y -> OK
*p = 10;
// u has variant y and is accessed through x -> bad
union int_or_short { int x; short y; } u = { .x = 3 };
printf("%d\n", u.y); // this is legal, because of type punning
union int_or_short { int x; short y; } u = { .x = 3 };
short *p = &u.y;
printf("%d\n", *p); // this is bad, type punning does not apply

Representation of pointers

Crucially, pointers in this model are not numbers. They are paths in a tree that define how to traverse memory, from the root of a block, through the fields of structs and unions and the indices in an array to the destination of the pointer.
See, for example, the image on page 72: The pointer to s.u.x[1] is the path that starts at the root, picks the u field of the struct, then the x field of the union, then the index 1 in the array. This way, the actual layout of structs and unions can remain uncertain, and pointer arithmetic that goes beyond array bounds becomes UB.

Relevance for Rust

Hopefully, we won't have type-based alias analysis or type punning in Rust. Still, some parts of the model may remain relevant: We have to decide how much pointer arithmetic we allow. The exact layout of structs and enums is left to the compiler (and we want to have optimizations like representing Option<Box> without a discriminant), so we probably want pointer arithmetic that makes assumptions about this layout to be UB. The tree-based addressing could be used to appropriately model this underspecification.

(Sorry my summary is not as good as the ones Niko wrote... )

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-related-workCategory: Discussion of related work (other languages, scientific publications, ...)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions