Skip to content

Soundness with __CPROVER_r_ok #8614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lks9 opened this issue Apr 2, 2025 · 4 comments
Closed

Soundness with __CPROVER_r_ok #8614

lks9 opened this issue Apr 2, 2025 · 4 comments

Comments

@lks9
Copy link
Contributor

lks9 commented Apr 2, 2025

For the following program, we get VERIFICATION_SUCCESFUL, although the assertion should give a FAILURE:

soundness_r_ok.c

#include <assert.h>
#include <limits.h>
#include <stdbool.h>

int sum(int s[], int size)
{
    // Assume the size is valid
    //__CPROVER_assume(size >= 0);
 
    __CPROVER_assume(size==3);

    // Assume that 's' has at least 'size' valid elements
    // This prevents out-of-bounds access
    __CPROVER_assume(__CPROVER_r_ok(s, size * sizeof(int)));

    int res = 0;
    int i = 0;

    // EDIT: Commenting out loop does not change the outcome
    do {
        res += s[i];
        i++;
    } while (i < size);

    assert(false);
    return res;
}

The assertion false should be reachable with the assumptions but we get:

$ cbmc soundness_r_ok.c --function sum
CBMC version 6.5.0 (cbmc-6.5.0-2-gcd39b3a97e) 64-bit x86_64 linux
...
[sum.assertion.1] line 23 assertion false: SUCCESS

** 0 of 13 failed (1 iterations)
VERIFICATION SUCCESSFUL

Expected:

[sum.assertion.1] line 23 assertion false: FAILURE
VERIFICATION FAILED

Background: With assume size==3 we check only one path of symbolic execution. My student @yosephinestwn is currently implementing an extension to CBMC, to verify just one path, we call that retracing (https://github.com/yosephinestwn/cbmc). To test the implementation, she started a version without the assume size==3 but with the newly implemented option --retrace 110 to follow the control flow trace. This control flow trace corresponds to the branching in the GOTO program and in our case, we have just one goto instruction IF sum::1::i < sum::size THEN GOTO 1. The --retrace 110 means the GOTO target branch is taken twice (the 11) and not taken for the third time (the 0), afterwards, assert false should give a verification failure. After trying to fix our implementation, we found out that the problem is already in the upstream CBMC with assume size==3.

@remi-delmas-3000
Copy link
Collaborator

remi-delmas-3000 commented Apr 2, 2025

Hi, thanks for reporting the issue.

From the perspective of CBMC, when symbolic execution uses sum directly as entry point, the pointer s is just undefined, since there is no explicit allocation that tells you what s is pointing to anywhere in the program.

The __CPROVER_r_ok predicate cannot be used to assume that some memory configuration exists, it can only be used to assert and check memory validity. CBMC is a verifier for closed programs and only understands explicit allocation + assignments.

You should instead use a top level harness that allocates memory explicitly:

#include <assert.h>
#include <limits.h>
#include <stdbool.h>

int sum(int s[], int size)
{
    int res = 0;
    int i = 0;

    do {
        res += s[i];
        i++;
    } while (i < size);

    assert(false);
    return res;
}

int main()
{
    int size;
    __CPROVER_assume(size==3);
    int s[size];
    int res = sum(s, size);
    return 0;
}

Then you get the expected results:

cbmc sum.c
CBMC version 6.4.1 (cbmc-6.4.1) 64-bit x86_64 macos
Type-checking sum
Generating GOTO Program
Adding CPROVER library (x86_64)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Starting Bounded Model Checking
Passing problem to propositional reduction
converting SSA
Running propositional reduction
SAT checker: instance is SATISFIABLE
Running propositional reduction
SAT checker inconsistent: instance is UNSATISFIABLE

** Results:
sum.c function sum
[sum.overflow.1] line 11 arithmetic overflow on signed + in res + s[(signed long int)i]: FAILURE
[sum.pointer_dereference.1] line 11 dereference failure: pointer NULL in s[(signed long int)i]: SUCCESS
[sum.pointer_dereference.2] line 11 dereference failure: pointer invalid in s[(signed long int)i]: SUCCESS
[sum.pointer_dereference.3] line 11 dereference failure: deallocated dynamic object in s[(signed long int)i]: SUCCESS
[sum.pointer_dereference.4] line 11 dereference failure: dead object in s[(signed long int)i]: SUCCESS
[sum.pointer_dereference.5] line 11 dereference failure: pointer outside object bounds in s[(signed long int)i]: SUCCESS
[sum.pointer_dereference.6] line 11 dereference failure: invalid integer address in s[(signed long int)i]: SUCCESS
[sum.overflow.2] line 12 arithmetic overflow on signed + in i + 1: SUCCESS
[sum.assertion.1] line 15 assertion false: FAILURE

** 2 of 9 failed (2 iterations)
VERIFICATION FAILED

There is a tool in the CBMC toolkit called goto-harness that can generate such harnesses for you when the input types are simple.

Also, enforcing a specific path within the control flow graph using assumptions may still result in CBMC symbolically executing some or all of the other paths in the CFG of the program, since symbolic execution engine may not be able to prune the branches ruled out by the constraints (it does not do very elaborate reasoning besides constant propagation to prune branches). If some of the constraints you're adding assign constants to some variables I'd recommend to encode them using assignments instead, to be sure Symex can propagate them and effectively prune the branches. Another way would be to insert assume(false) in blocks you don't want to execute at all , since symbolic execution stops on assume false assume(false).

@lks9
Copy link
Contributor Author

lks9 commented Apr 2, 2025

Thanks for the quick reply!

Hi, thanks for reporting the issue.

From the perspective of CBMC, when symbolic execution uses sum directly as entry point, the pointer s is just undefined, since there is no explicit allocation that tells you what s is pointing to anywhere in the program.

The __CPROVER_r_ok predicate cannot be used to assume that some memory configuration exists, it can only be used to assert and check memory validity. CBMC is a verifier for closed programs and only understands explicit allocation + assignments.

You should instead use a top level harness that allocates memory explicitly:

int s[size];

Thank you, that makes sense! Hence, I will close the issue.

There is a tool in the CBMC toolkit called goto-harness that can generate such harnesses for you when the input types are simple.

Thanks! We will look into it.

Also, enforcing a specific path within the control flow graph using assumptions may still result in CBMC symbolically executing some or all of the other paths in the CFG of the program, since symbolic execution engine may not be able to prune the branches ruled out by the constraints (it does not do very elaborate reasoning besides constant propagation to prune branches). If some of the constraints you're adding assign constants to some variables I'd recommend to encode them using assignments instead, to be sure Symex can propagate them and effectively prune the branches. Another way would be to insert assume(false) in blocks you don't want to execute at all , since symbolic execution stops on assume false assume(false).

I have to say I did not quite get what you mean by assigning constants to some variables. I guess you are referring to #8428 where we used another approach to follow a control flow trace with assumptions. In this approach, a concrete control flow trace (with constant elements) is assigned once to an array variable at the beginning of the execution and then we use assumptions like you described to prune branches. I don't see how to turn these assumptions into assignments but if you have any intuition, I am quite open for suggestions!

With our new approach (https://github.com/yosephinestwn/cbmc), we do not use assumptions to prune branches at all. Instead, we modified the symbolic execution engine, to be specific, we created a new method

void goto_symext::symex_goto_retrace(statet &state, std::vector<int> trace) { ... }

as alternative to:

void goto_symext::symex_goto(statet &state) { ... }

With a control-flow trace is given to cbmc via cbmc --retrace ..., we use the new implementation. It was actually tricky to figure out how symex_goto works, because the upstream implementation is the worst spaghetti code I have seen so far. But once we got it, the implementation was straight forward.

Thus the execution engine follows only the respective paths and pruning is no longer needed. This makes it much much more efficient. The downside is that the control-flow trace is based on the GOTO program, not the C source code.

@lks9 lks9 closed this as completed Apr 2, 2025
@yosephinestwn
Copy link

Hello,

The last commit is made in retrace-execute-from-scratch branch (yosephinestwn@7931a3e), in case you want to see it. Also, thank you for the help :)

@remi-delmas-3000
Copy link
Collaborator

remi-delmas-3000 commented Apr 2, 2025

With our new approach (https://github.com/yosephinestwn/cbmc), we do not use assumptions to prune branches at all. Instead, we modified the symbolic execution engine, to be specific, we created a new method

Ok got it. So you're really controlling the symbolic execution to step through only a predefined sequence of basic blocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants