33
44\author Kareem Khazem
55
6+ \section symbolic-execution Symbolic Execution
7+
8+ In the \ref goto-symex directory.
9+
610** Key classes:**
711* goto_symex_statet
812* goto_symext
@@ -19,6 +23,115 @@ digraph G {
1923}
2024\enddot
2125
26+ \subsection symex-overview Overview
27+
28+ The \ref bmct class gets a reference to an \ref symex_target_equationt
29+ "equation" (initially, an empty list of \ref symex_target_equationt::SSA_stept
30+ "single-static assignment steps") and a goto-program from the frontend.
31+ The \ref bmct creates a goto_symext to symbolically execute the
32+ goto-program, thereby filling the equation, which can then be passed
33+ along to the SAT solver.
34+
35+ The symbolic execution state is maintained by goto_symex_statet. This is
36+ a memory-heavy data structure, so goto_symext creates it on-demand and
37+ lets it go out-of-scope as soon as possible. However, the process of
38+ symbolic execution fills out an additional \ref symbol_tablet
39+ "symbol table" of dynamically-created objects; this symbol table is
40+ needed when solving the equation. This symbol table must thus be
41+ exported out of the state before it is torn down; this is done through
42+ the parameter "` new_symbol_table ` " passed as a reference to the various
43+ functions in goto_symext.
44+
45+ The main symbolic execution loop code is goto_symext::symex_step(). This
46+ function case-switches over the type of the instruction that we're
47+ currently executing, and calls various other functions appropriate to
48+ the instruction type, i.e. goto_symext::symex_function_call() if the
49+ current instruction is a function call, goto_symext::symex_goto() if the
50+ current instruction is a goto, etc.
51+
52+ \subsection symex-path Path exploration
53+
54+ By default, CBMC symbolically executes the entire program, building up
55+ an equation representing all instructions, which it then passes to the
56+ solver. Notably, it _ merges_ paths that diverge due to a goto
57+ instruction, forming a constraint that represents having taken either of
58+ the two paths; the code for doing this is goto_symext::merge_goto().
59+
60+ CBMC can operate in a different mode when the ` --paths ` flag is passed
61+ on the command line. This disables path merging; instead, CBMC executes
62+ a single path at a time, calling the solver with the equation
63+ representing that path, then continues to execute other paths.
64+ The 'other paths' that would have been taken when the program branches
65+ are saved onto a workqueue so that CBMC can continue to execute the
66+ current path, and then later retrieve saved paths from the workqueue.
67+ Implementation-wise, \ref bmct passes a worklist to goto_symext in
68+ bmct::do_language_agnostic_bmc(). If path exploration is enabled,
69+ goto_symext will fill up the worklist whenever it encounters a branch,
70+ instead of merging the paths on the branch. After the initial symbolic
71+ execution (i.e. the first call to bmct::run() in
72+ bmct::do_language_agnostic_bmc()), \ref bmct continues popping the
73+ worklist and executing untaken paths until the worklist empties. Note
74+ that this means that the default model-checking behaviour is a special
75+ case of path exploration: when model-checking, the initial symbolic
76+ execution run does not add any paths to the workqueue but rather merges
77+ all the paths together, so the additional path-exploration loop is
78+ skipped over.
79+
80+ \subsection ssa-renaming SSA renaming levels
81+
82+ In goto-programs, variable names get a prefix to indicate their scope
83+ (like ` main::1::%foo ` or whatever). At symbolic execution level, variables
84+ also get a _ suffix_ because we’re doing single-static assignment. There
85+ are three “levels” of renaming. At Level 2, variables are renamed every
86+ time they are encountered in a function. At Level 1, variables are
87+ renamed every time the functions that contain those variables are
88+ called. At Level 0, variables are renamed every time a new thread
89+ containing those variables are spawned. We can inspect the renamed
90+ variable names with the –show-vcc flag, which prints a string with the
91+ following format: ` %%s!%%d@%%d#%%d ` . The string field is the variable name,
92+ and the numbers after the !, @, and # are the L0, L1, and L2 suffixes
93+ respectively. The following examples illustrate Level 1 and 2 renaming:
94+
95+ $ cat l1.c
96+ int main() {
97+ int x=7;
98+ x=8;
99+ assert(0);
100+ }
101+ $ cbmc --show-vcc l1.c
102+ ...
103+ {-12} x!0@1#2 == 7
104+ {-13} x!0@1#3 == 8
105+
106+ That is, the L0 names for both xs are 0; the L1 names for both xs are 1;
107+ but each occurrence of x within main() gets a fresh L2 suffix (2 and 3,
108+ respectively).
109+
110+ $ cat l2.c
111+ void foo(){
112+ int x=7;
113+ x=8;
114+ x=9;
115+ }
116+ int main(){
117+ foo();
118+ foo();
119+ assert(0);
120+ }
121+ $ cbmc --show-vcc l2.c
122+ ...
123+ {-12} x!0@1#2 == 7
124+ {-13} x!0@1#3 == 8
125+ {-14} x!0@1#4 == 9
126+ {-15} x!0@2#2 == 7
127+ {-16} x!0@2#3 == 8
128+ {-17} x!0@2#4 == 9
129+
130+ That is, every time we enter function foo, the L1 counter of x is
131+ incremented (from 1 to 2) and the L0 counter is reset (back to 2, after
132+ having been incremented up to 4). The L0 counter then increases every
133+ time we access x as we walk through the function.
134+
22135---
23136\section counter-example-production Counter Example Production
24137
0 commit comments