Skip to content

Commit fb6e66f

Browse files
committed
Improve goto-symex documentation
1 parent dbd6988 commit fb6e66f

File tree

1 file changed

+62
-24
lines changed

1 file changed

+62
-24
lines changed

src/goto-symex/README.md

Lines changed: 62 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,12 @@
77

88
This directory contains a symbolic evaluation system for goto-programs.
99
This takes a goto-program and translates it to an equation system by
10-
traversing the program, branching and merging and unwinding loops as
11-
needed. Each reverse goto has a separate counter (the actual counting is
12-
handled by `cbmc`, see the `–unwind` and `–unwind-set` options). When a
13-
counter limit is reach, an assertion can be added to explicitly show
14-
when analysis is incomplete. The symbolic execution includes constant
15-
folding so loops that have a constant number of iterations will be
16-
handled completely (assuming the unwinding limit is sufficient).
10+
traversing the program, branching and merging and unwinding loops and recursion
11+
as needed.
1712

1813
The output of the symbolic execution is a system of equations; an object
19-
containing a list of `symex_target_elements`, each of which are
20-
equalities between `expr` expressions. See `symex_target_equation.h`.
14+
containing a list of \ref symex_target_equationt::SSA_stept structures, each of
15+
which are equalities between \ref exprt expressions.
2116
The output is in static, single assignment (SSA) form, which is *not*
2217
the case for goto-programs.
2318

@@ -50,44 +45,85 @@ The \ref bmct creates a goto_symext to symbolically execute the
5045
goto-program, thereby filling the equation, which can then be passed
5146
along to the SAT solver.
5247

53-
The symbolic execution state is maintained by goto_symex_statet. This is
54-
a memory-heavy data structure, so goto_symext creates it on-demand and
55-
lets it go out-of-scope as soon as possible. However, the process of
56-
symbolic execution fills out an additional \ref symbol_tablet
48+
The symbolic execution state is maintained by \ref goto_symex_statet. This
49+
contains an instance of \ref value_sett, used to track what objects pointers
50+
may refer to, and a constant propagator domain used to try to simplify
51+
assignments and (more usefully) resolve branch instructions.
52+
53+
This is a memory-heavy data structure, so goto_symext creates it on-demand and
54+
lets it go out-of-scope as soon as possible. The state is forked at control-
55+
flow divergences and may be joined at convergences, depending on the configured
56+
path exploration strategy.
57+
58+
The process of
59+
symbolic execution generates an additional \ref symbol_tablet
5760
"symbol table" of dynamically-created objects; this symbol table is
5861
needed when solving the equation. This symbol table must thus be
5962
exported out of the state before it is torn down; this is done through
6063
the parameter "`new_symbol_table`" passed as a reference to the various
6164
functions in goto_symext.
6265

63-
The main symbolic execution loop code is goto_symext::symex_step(). This
66+
The main symbolic execution loop code is \ref goto_symext::symex_step. This
6467
function case-switches over the type of the instruction that we're
6568
currently executing, and calls various other functions appropriate to
6669
the instruction type, i.e. goto_symext::symex_function_call() if the
6770
current instruction is a function call, goto_symext::symex_goto() if the
6871
current instruction is a goto, etc.
6972

73+
\subsection Loop and recursion unwinding
74+
75+
Each backwards goto and recursive call has a separate counter
76+
(handled by \ref cbmc or another driver program, see the `–unwind` and
77+
`–unwind-set` options). The symbolic execution includes constant
78+
folding so loops that have a constant / bounded number of iterations will often
79+
be handled completely (assuming the unwinding limit is sufficient).
80+
When an unwind or recursion limit is reached, an assertion can be added to
81+
explicitly show when analysis is incomplete.
82+
83+
\subsection \ref goto_symext::clean_expr
84+
85+
Before any expression is incorporated into the output equation set it is passed
86+
through \ref goto_symext::clean_expr and thus \ref goto_symext::dereference,
87+
whose primary purpose is to remove dereference operations. It achieves this
88+
using the \ref value_sett stored in \ref goto_symex_statet, replacing `*x` with
89+
a construct such as
90+
`x == &candidate1 ? candidate1 : x == &candidate2 ? candidate2 : x$object`
91+
92+
Note the `x$object` fallback candidate, which is known as a *failed symbol* or
93+
*failed object*, and represents the unknown object pointed to by `x` when
94+
neither of the candidates (`candidate1` and `candidate2` here) matched as
95+
expected. This is of course unsound, since `x$object` and `y$object` are always
96+
distinct, even if `x` and `y` might actually alias, but at least it ensures
97+
writes to and subsequent reads from `x` are related.
98+
99+
Despite its name, the \ref goto_symext::dereference function is also responsible
100+
for calling \ref goto_symex_statet::rename, which is responsible for both SSA
101+
renaming symbols and for applying constant propagation where possible.
102+
70103
\subsection symex-path Path exploration
71104

72105
By default, CBMC symbolically executes the entire program, building up
73106
an equation representing all instructions, which it then passes to the
74107
solver. Notably, it _merges_ paths that diverge due to a goto
75108
instruction, forming a constraint that represents having taken either of
76-
the two paths; the code for doing this is goto_symext::merge_goto().
77-
78-
CBMC can operate in a different mode when the `--paths` flag is passed
79-
on the command line. This disables path merging; instead, CBMC executes
80-
a single path at a time, calling the solver with the equation
109+
the two paths; the code for doing this is \ref goto_symext::merge_goto.
110+
111+
Symex can operate in a different mode when the `--paths` flag is passed
112+
in the \ref optionst object passed to its constructor (\ref cbmc passes this
113+
from the corresponding command-line option).
114+
This disables path merging; instead, symex executes
115+
a single path at a time, returning each one to its caller, which in the case of
116+
cbmc then calls its solver with the equation
81117
representing that path, then continues to execute other paths.
82118
The 'other paths' that would have been taken when the program branches
83-
are saved onto a workqueue so that CBMC can continue to execute the
84-
current path, and then later retrieve saved paths from the workqueue.
119+
are saved onto a workqueue so that the driver program can continue to execute
120+
the current path, and then later retrieve saved paths from the workqueue.
85121
Implementation-wise, \ref bmct passes a worklist to goto_symext in
86-
bmct::do_language_agnostic_bmc(). If path exploration is enabled,
122+
\ref bmct::do_language_agnostic_bmc. If path exploration is enabled,
87123
goto_symext will fill up the worklist whenever it encounters a branch,
88124
instead of merging the paths on the branch. After the initial symbolic
89-
execution (i.e. the first call to bmct::run() in
90-
bmct::do_language_agnostic_bmc()), \ref bmct continues popping the
125+
execution (i.e. the first call to \ref bmct::run in
126+
\ref bmct::do_language_agnostic_bmc), \ref bmct continues popping the
91127
worklist and executing untaken paths until the worklist empties. Note
92128
that this means that the default model-checking behaviour is a special
93129
case of path exploration: when model-checking, the initial symbolic
@@ -155,6 +191,8 @@ time we access x as we walk through the function.
155191

156192
In the \ref goto-symex directory.
157193

194+
195+
158196
**Key classes:**
159197
* symex_target_equationt
160198
* prop_convt

0 commit comments

Comments
 (0)