Skip to content

Feature/restrict function pointer by name [depends-on: #5168] #5174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/cprover-manual/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
[A Short Tutorial](cbmc/tutorial/),
[Loop Unwinding](cbmc/unwinding/),
[Assertion Checking](cbmc/assertions/),
[Restricting function pointers](cbmc/restrict-function-pointer/),
[Memory Analyzer](cbmc/memory-analyzer/),
[Program Harness](cbmc/goto-harness/)

Expand Down
173 changes: 173 additions & 0 deletions doc/cprover-manual/restrict-function-pointer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
[CPROVER Manual TOC](../../)

## Restricting function pointers

### Motivation

CBMC comes with a way to resolve calls to function pointers to direct function
calls. This is needed because symbolic execution itself can't handle calls to
function pointers. In practice, this looks something like this:

Given that there are functions with these signatures available in the program:

```
int f(int x);
int g(int x);
int h(int x);
```

And we have a call site like this:

```
typedef int(*fptr_t)(int x);
void call(fptr_t fptr) {
int r = fptr(10);
assert(r > 0);
}
```

Function pointer removal will turn this into code similar to this:

```
void call(fptr_t fptr) {
int r;
if(fptr == &f) {
r = f(10);
} else if(fptr == &g) {
r = g(10);
} else if(fptr == &h) {
r = h(10);
} else {
// sanity check
assert(false);
assume(false);
}
return r;
}
```

This works well enough for simple cases. However, this is a very simple
replacement only based on the signature of the function (and whether or not they
have their address taken somewhere in the program), so if there are many
functions matching a particular signature, or if some of these functions are
expensive in symex (e.g. functions with lots of loops or recursion), then this
can be a bit cumbersome - especially if we, as a user, already know that a
particular function pointer will only resolve to a single function or a small
set of functions. This is what the `--restrict-function-pointer` option allows.

### Example

Take the motivating example. Let us assume that we know for a fact that `call`
will always receive pointers to either `f` or `g` during actual executions of
the program, and symbolic execution for `h` is too expensive to simply ignore
the cost of its branch. For this, we will label the places in each function
where function pointers are being called, to this pattern:

```
<function-name>.function_pointer_call.<N>
```

where `N` is referring to which function call it is - so the first call to a
function pointer in a function will have `N=1`, the 5th `N=5` etc.

We can call `cbmc` with `--restrict-function-pointer
call.function_pointer_call.1/f,g`. This can be read as

> For the first call to a function pointer in the function `call`, assume that
> it can only be a call to `f` or `g`

The resulting code looks similar to the original example, except now there will
not be a call to `h`:

```
void call(fptr_t fptr) {
int r;
if(fptr == &f) {
r = f(10);
} else if(fptr == &g) {
r = g(10);
} else {
// sanity check
assert(false);
assume(false);
}
return r;
}
```

Another example: Imagine we have a simple virtual filesystem API and implementation
like this:

```
typedef struct filesystem_t filesystem_t;
struct filesystem_t {
int (*open)(filesystem_t *filesystem, const char* file_name);
};

int fs_open(filesystem_t *filesystem, const char* file_name) {
filesystem->open(filesystem, file_name);
}

int nullfs_open(filesystem_t *filesystem, const char* file_name) {
return -1;
}

filesystem_t nullfs_val = {.open = nullfs_open};
filesystem *const nullfs = &nullfs_val;

filesystem_t *get_fs_impl() {
// some fancy logic to determine
// which filesystem we're getting -
// in-memory, backed by a database, OS file system
// - but in our case, we know that
// it always ends up being nullfs
// for the cases we care about
return nullfs;
}
int main(void) {
filesystem_t *fs = get_fs_impl();
assert(fs_open(fs, "hello.txt") != -1);
}
```

In this case, the assumption is that *we* know that in our `main`, `fs` can be
nothing other than `nullfs`; But perhaps due to the logic being too complicated
symex ends up being unable to figure this out, so in the call to `fs_open()` we
end up branching on all functions matching the signature of
`filesystem_t::open`, which could be quite a few functions within the program.
Worst of all, if it's address is ever taken in the program, as far as the "dumb"
function pointer removal is concerned it could be `fs_open()` itself due to it
having a matching signature, leading to symex being forced to follow a
potentially infinite recursion until its unwind limit.

In this case we can again restrict the function pointer to the value which we
know it will have:

```
--restrict-function-pointer fs_open.function_pointer_call.1/nullfs_open
```

### Loading from file

If you have many places where you want to restrict function pointers, it'd be a
nuisance to have to specify them all on the command line. In these cases, you
can specify a file to load the restrictions from instead, via the
`--function-pointer-restrictions-file` option, which you can give the name of a
JSON file with this format:

```
{
"function_call_site_name": ["function1", "function2", ...],
...
}
```

**Note:** If you pass in multiple files, or a mix of files and command line
restrictions, the final restrictions will be a set union of all specified
restrictions.

**Note:** as of now, if something goes wrong during type checking (i.e. making
sure that all function pointer replacements refer to functions in the symbol
table that have the correct type), the error message will refer the command line
option `--restrict-function-pointer` regardless of whether the restriction in
question came from the command line or a file.
43 changes: 43 additions & 0 deletions regression/cbmc/restrict-function-pointer-by-name/test.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#include <assert.h>

int f(void)
{
return 1;
}
int g(void)
{
return 2;
}
int h(void)
{
return 3;
}

typedef int (*fptr_t)(void);

void use_f(fptr_t fptr)
{
assert(fptr() == 2);
}

int nondet_int(void);

fptr_t g_fptr;
int main(void)
{
int cond = nondet_int();
if(cond)
{
g_fptr = f;
}
else
{
g_fptr = h;
}
int res = g_fptr();
assert(res == 1 || res == 3);
use_f(g);
fptr_t fptr = f;
assert(fptr() == 1);
return 0;
}
11 changes: 11 additions & 0 deletions regression/cbmc/restrict-function-pointer-by-name/test.desc
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
CORE
test.c
--restrict-function-pointer-by-name main::1::fptr/f --restrict-function-pointer-by-name use_f::fptr/g --restrict-function-pointer-by-name g_fptr/f,h
\[use_f.assertion.2\] line \d+ assertion fptr\(\) == 2: SUCCESS
\[use_f.assertion.1\] line \d+ dereferenced function pointer at use_f.function_pointer_call.1 must be g: SUCCESS
\[main.assertion.1\] line \d+ dereferenced function pointer at main.function_pointer_call.1 must be one of \[(h|f), (h|f)\]: SUCCESS
\[main.assertion.2\] line \d+ assertion res == 1 \|\| res == 3: SUCCESS
\[main.assertion.3\] line \d+ dereferenced function pointer at main.function_pointer_call.2 must be f: SUCCESS
\[main.assertion.4\] line \d+ assertion fptr\(\) == 1: SUCCESS
^EXIT=0$
^SIGNAL=0$
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#include <assert.h>

typedef int (*fptr_t)(int);

fptr_t get_f(void);
fptr_t get_g(void);

void use_fg(int choice, fptr_t fptr, fptr_t gptr)
{
assert((choice ? fptr : gptr)(10) == 10 + choice);
}

// this is just here to distinguish the behavior from FP removal, which'd include g
int g_always_false_cond = 0;

int main(void)
{
use_fg(0, get_f(), get_g());
use_fg(1, get_f(), get_g());
}

int f(int x)
{
return x + 1;
}

int g(int x)
{
return x;
}

int h(int x)
{
return x / 2;
}

fptr_t get_f(void)
{
if(!g_always_false_cond)
{
return f;
}
else
{
return h;
}
}

fptr_t get_g(void)
{
if(!g_always_false_cond)
{
return g;
}
else
{
return h;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
CORE
test.c
--restrict-function-pointer "use_fg.function_pointer_call.1/f,g"
\[use_fg.assertion.2\] line \d+ assertion \(choice \? fptr : gptr\)\(10\) == 10 \+ choice: SUCCESS
\[use_fg.assertion.1\] line \d+ dereferenced function pointer at use_fg.function_pointer_call.1 must be one of \[(f|g), (f|g)\]: SUCCESS
^EXIT=0$
^SIGNAL=0$
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"use_fg.function_pointer_call.1": ["f"]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#include <assert.h>

typedef int (*fptr_t)(int);

fptr_t get_f(void);

void use_f(fptr_t fptr)
{
assert(fptr(10) >= 10);
}

void select_f(void);
void select_g(void);
void select_h(void);

int main(void)
{
select_f();
use_f(get_f());
select_g();
use_f(get_f());
}

int f(int x)
{
return x + 1;
}
int g(int x)
{
return x;
}
int h(int x)
{
return x - 1;
}

int g_select_function = 0;

void select_f(void)
{
g_select_function = 0;
}
void select_g(void)
{
g_select_function = 1;
}
void select_h(void)
{
g_select_function = 2;
}

fptr_t get_f(void)
{
if(g_select_function == 0)
{
return f;
}
else if(g_select_function == 1)
{
return g;
}
else
{
return h;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
CORE
test.c
--restrict-function-pointer use_f.function_pointer_call.1/f,g --show-goto-functions
f\(10\)
g\(10\)
^EXIT=0$
^SIGNAL=0$
--
h\(10\)
Loading