Skip to content

Commit 3a58226

Browse files
committed
[docs 3/5] Add per-directory high-level docs
This commit introduces a module.md file for several CProver directories. Each of these is turned into a page under the Modules section in the generated Doxygen documentation. The intention is that developers wishing to contribute to one specific aspect of CProver can get a high-level architectural overview of a particular directory; the documentation describes the input to and output from that directory, and introduces the main classes or entry points. By way of a "table of contents," the file cbmc/module.md contains a diagram describing how each of the directories is invoked by CBMC in order, and the nodes of the diagram hyperlink to the appropriate documentation. The intention is that developers wishing to contribute to CBMC as a whole can understand the entire process, from source files to bug reports and counterexample production. This documentation is derived from Mark Tuttle's notes on a talk given by Michael Tautschnig.
1 parent f5be7f1 commit 3a58226

File tree

7 files changed

+554
-2
lines changed

7 files changed

+554
-2
lines changed

doc/architectural/front-page.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
11
CProver Documentation
22
=====================
33

4-
These pages contain both user tutorials and automatically-generated API
5-
documentation. Users can download CProver tools from the
4+
\author Kareem Khazem
5+
6+
These pages contain user tutorials, automatically-generated API
7+
documentation, and higher-level architectural overviews for the
8+
CProver codebase. Users can download CProver tools from the
69
<a href="http://www.cprover.org/">CProver website</a>; contributors
710
should use the <a href="https://github.com/diffblue/cbmc">repository</a>
811
hosted on GitHub.
@@ -21,4 +24,12 @@ hosted on GitHub.
2124
members in the search bar at top-right or use one of the links in the
2225
sidebar.
2326

27+
* For higher-level architectural information, each of the pages under
28+
the "Modules" link in the sidebar gives an overview of a directory in
29+
the CProver codebase.
30+
31+
* The \ref module_cbmc "CBMC guided tour" is a good start for new
32+
contributors to CBMC. It describes the stages through which CBMC
33+
transforms source files into bug reports and counterexamples, linking
34+
to the relevant documentation for each stage.
2435
\defgroup module_hidden _hidden

src/ansi-c/module.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
\ingroup module_hidden
2+
\defgroup module_ansi-c ANSI-C Language Front-end
3+
4+
\author Kareem Khazem
5+
6+
\section preprocessing Preprocessing & Parsing
7+
8+
In the \ref ansi-c and \ref java_bytecode directories
9+
10+
**Key classes:**
11+
* \ref languaget and its subclasses
12+
* ansi_c_parse_treet
13+
14+
\dot
15+
digraph G {
16+
node [shape=box];
17+
rankdir="LR";
18+
1 [shape=none, label=""];
19+
2 [label="preprocessing & parsing"];
20+
3 [shape=none, label=""];
21+
1 -> 2 [label="Command line options, file names"];
22+
2 -> 3 [label="Parse tree"];
23+
}
24+
\enddot
25+
26+
27+
28+
---
29+
\section type-checking Type-checking
30+
31+
In the \ref ansi-c and \ref java_bytecode directories.
32+
33+
**Key classes:**
34+
* \ref languaget and its subclasses
35+
* \ref irept
36+
* \ref irep_idt
37+
* \ref symbolt
38+
* symbol_tablet
39+
40+
\dot
41+
digraph G {
42+
node [shape=box];
43+
rankdir="LR";
44+
1 [shape=none, label=""];
45+
2 [label="type checking"];
46+
3 [shape=none, label=""];
47+
1 -> 2 [label="Parse tree"];
48+
2 -> 3 [label="Symbol table"];
49+
}
50+
\enddot
51+
52+
This stage generates a symbol table, mapping identifiers to symbols;
53+
\ref symbolt "symbols" are tuples of (value, type, location, flags).
54+
55+
This is a good point to introduce the \ref irept ("internal
56+
representation") class---the base type of many of CBMC's hierarchical
57+
data structures. In particular, \ref exprt "expressions",
58+
\ref typet "types" and \ref codet "statements" are all subtypes of
59+
\ref irept.
60+
An irep is a tree of ireps. A subtlety is that an irep is actually the
61+
root of _three_ (possibly empty) trees, i.e. it has three disjoint sets
62+
of children: \ref irept::get_sub() returns a list of children, and
63+
\ref irept::get_named_sub() and \ref irept::get_comments() each return an
64+
association from names to children. **Most clients never use these
65+
functions directly**, as subtypes of irept generally provide more
66+
descriptive functions. For example, the operands of an
67+
\ref exprt "expression" (\ref exprt::op0() "op0", op1 etc) are
68+
really that expression's children; the
69+
\ref code_assignt::lhs() "left-hand" and right-hand side of an
70+
\ref code_assignt "assignment" are the children of that assignment.
71+
The \ref irept::pretty() function provides a descriptive string
72+
representation of an irep.
73+
74+
\ref irep_idt "irep_idts" ("identifiers") are strings that use sharing
75+
to improve memory consumption. A common pattern is a map from irep_idts
76+
to ireps. A goto-program contains a single symbol table (with a single
77+
scope), meaning that the names of identifiers in the target program are
78+
lightly mangled in order to make them globally unique. If there is an
79+
identifier `foo` in the target program, the `name` field of `foo`'s
80+
\ref symbolt "symbol" in the goto-program will be
81+
* `foo` if it is global;
82+
* <code>bar\::foo</code> if it is a parameter to a function `bar()`;
83+
* <code>bar\::3\::foo</code> if it is a local variable in a function
84+
`bar()`, where `3` is a counter that is incremented every time a
85+
newly-scoped `foo` is encountered in that function.
86+
87+
The use of *sharing* to save memory is a pervasive design decision in
88+
the implementation of ireps and identifiers. Sharing makes equality
89+
comparisons fast (as there is no need to traverse entire trees), and
90+
this is especially important given the large number of map lookups
91+
throughout the codebase. More importantly, the use of sharing saves vast
92+
amounts of memory, as there is plenty of duplication within the
93+
goto-program data structures. For example, every statement, and every
94+
sub-expression of a statement, contains a \ref source_locationt
95+
that indicates the source file and location that it came from. Every
96+
symbol in every expression has a field indicating its type and location;
97+
etc. Although each of these are constructed as separate objects, the
98+
values that they eventually point to are shared throughout the codebase,
99+
decreasing memory consumption dramatically.
100+
101+
The Type Checking stage turns a parse tree into a
102+
\ref symbol_tablet "symbol table". In this context, the 'symbols'
103+
consist of code statements as well as what might more traditionally be
104+
called symbols. Thus, for example:
105+
* The statement `int foo = 11;` is converted into a symbol whose type is
106+
integer_typet and value is the \ref constant_exprt
107+
"constant expression" `11`; that symbol is stored in the symbol table
108+
using the mangled name of `foo` as the key;
109+
* The function definition `void foo(){ int x = 11; bar(); }` is
110+
converted into a symbol whose type is \ref code_typet (not to be
111+
confused with \ref typet or \ref codet!); the code_typet contains the
112+
parameter and return types of the function. The value of the symbol is
113+
the function's body (a \ref codet), and the symbol is stored in the
114+
symbol table with `foo` as the key.

src/cbmc/module.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
\ingroup module_hidden
2+
\defgroup module_cbmc CBMC tour
3+
4+
\author Kareem Khazem
5+
6+
CBMC takes C code or a goto-binary as input and tries to emit traces of
7+
executions that lead to crashes or undefined behaviour. The diagram
8+
below shows the intermediate steps in this process.
9+
10+
11+
\dot
12+
digraph G {
13+
14+
rankdir="TB";
15+
node [shape=box, fontcolor=blue];
16+
17+
subgraph top {
18+
rank=same;
19+
1 -> 2 -> 3 -> 4;
20+
}
21+
22+
subgraph bottom {
23+
rank=same;
24+
5 -> 6 -> 7 -> 8 -> 9;
25+
}
26+
27+
/* shift bottom subgraph over */
28+
9 -> 1 [color=white];
29+
30+
4 -> 5;
31+
32+
1 [label="command line\nparsing" URL="\ref cbmc_parse_optionst"];
33+
2 [label="preprocessing,\nparsing" URL="\ref preprocessing"];
34+
3 [label="language\ntype-checking" URL="\ref type-checking"];
35+
4 [label="goto\nconversion" URL="\ref goto-conversion"];
36+
5 [label="instrumentation" URL="\ref instrumentation"];
37+
6 [label="symbolic\nexecution" URL="\ref symbolic-execution"];
38+
7 [label="SAT/SMT\nencoding" URL="\ref sat-smt-encoding"];
39+
8 [label="decision\nprocedure" URL="\ref decision-procedure"];
40+
9 [label="counter example\nproduction" URL="\ref counter-example-production"];
41+
}
42+
\enddot
43+
44+
The \ref cprover-manual "CProver Manual" describes CBMC from a user
45+
perspective. Each node in the diagram above links to the appropriate
46+
class or module documentation, describing that particular stage in the
47+
CBMC pipeline.

0 commit comments

Comments
 (0)