Skip to content

Add function pointer restriction option #5168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

hannes-steffenhagen-diffblue
Copy link
Contributor

This adds a new option, --restrict-function-pointer, to cbmc. This lets a user
specify a list of possible pointer targets for calls to a particular function
pointer variable, rather than have remove_function_pointers guess possible
values. The intended purpose behind this is to prevent excessive symex time
wasted on exploring paths the user knows the program can never actually take.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@@ -78,6 +79,7 @@ Author: Daniel Kroening, [email protected]

#include <pointer-analysis/add_failed_symbols.h>

#include <goto-programs/simplify_function_pointers.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move above (to the rest of goto-program includes).

@@ -750,7 +752,8 @@ int cbmc_parse_optionst::get_goto_program(
return CPROVER_EXIT_SUCCESS;
}

if(cbmc_parse_optionst::process_goto_program(goto_model, options, log))
if(cbmc_parse_optionst::process_goto_program(
goto_model, options, log, cmdline))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you just copy the cmdline part that you need into options instead of passing the whole thing?

}

template <typename GotoFunctionT, typename HandlerT>
void for_each_goto_location(GotoFunctionT &&goto_function, HandlerT handle)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called for_each_goto_location but inside you iterate over instructions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against these but please note the amount of hatred that forall_goto_program_instructions et al. used to get and try to get some kind of agreement on whether we should be doing more or less of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this here for now, but I don't feel too strongly about this - I'll remove this if people don't like it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it but I know it was a big source of argument at one point (I don't remember what was decided TBH) so be a little careful. I think it might be in the coding standard...

template <typename GotoFunctionT, typename Predicate, typename HandlerT>
void for_each_goto_location_if(
GotoFunctionT &&goto_function,
Predicate predicate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PredicateT for consistency?

\*******************************************************************/

/// \file
/// Given goto functions and a list function parameters or globals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a list of function parameters


#include "goto_model.h"

void simplify_function_pointers(goto_modelt &goto_model);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably deserves documenting as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is there a test for this?

{
for_each_goto_location(
goto_function.second, [&](goto_programt::targett it) {
if(it->is_function_call())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it would help much, but you could use the for_each_goto_location_if and split the is_function_call /\ is_dereference as Predicate.

function_call.source_location(),
goto_function_symbol_mode,
goto_model.symbol_table);
auto assign_instruction = goto_programt::instructiont{ASSIGN};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is goto_programt::make_assignment.

assign_instruction.code =
code_assignt{new_function_pointer_symbol.symbol_expr(),
function_pointer_dereference.pointer()};
goto_function.second.body.instructions.insert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe insert_before would be more explicit.

PRECONDITION(begin != end);
auto const pointer_to_function =
address_of_exprt{symbol_table.lookup_ref(*begin).symbol_expr()};
id_iteratort const next = begin + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is const first or is actual type first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this function entirely, it was there for an earlier version of this that I since discarded.

if(function_type.id() != ID_code)
{
throw invalid_command_line_argument_exceptiont{
"not a function pointer: " + id2string(restriction.first),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it gives the same message (as the throw above), couldn't it be merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? But I think of them as two different failure conditions that just happen to have the same text in them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, (what about "not a pointer to a function").

{
throw invalid_command_line_argument_exceptiont{
"type mismatch: `" + id2string(restriction.first) + "' points to `" +
type2c(function_type, ns) + "', but restriction `" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also util/format but I'm not sure which should be prefered.

for_each_goto_location_if(
goto_function,
[](goto_programt::targett target) { return target->is_function_call(); },
[&](goto_programt::targett target) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed thanks

{
typecheck_function_pointer_restrictions(goto_model, restrictions);

// we iterate over all restrictions we might have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop below does not seem to iterate over restrictions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vestigial comment, will fix

// we iterate over all restrictions we might have
for(auto &goto_function : goto_model.goto_functions.function_map)
{
// for each function call, we l
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brain stopped mid sentence

auto const is_calling_pointer_symbol =
to_dereference_expr(original_function_call.function())
.pointer() == pointer_symbol;
const symbol_exprt &pointer_symbol =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, can we rename this: there already is one pointer_symbol above? Second, can we move it below, closer to it's use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two are actually the same thing, I'll remove this one

source_location.set_property_class(ID_assertion);
// XXX I want to hide these assertions unless they fail - apparently this is
/// not the correct way to do this?
source_location.set_hide();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't want this here at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep it in here for now. I'll remove this when this goes out of draft, I'm still hoping someone knows how to do this correctly.

string2id);
return target_names;
})();
for(auto const &target_name : target_names)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you run this check on target_name_strings?

.second)
{
throw invalid_command_line_argument_exceptiont{
"function pointer restriction for `" + id2string(pointer_name) +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointer_name is a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed thanks

}
```

This works well enough for simple cases. However, this is a "dumb" replacement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this perhaps isn't customer-facing text, I think it's high enough for us to call existing methods dumb. What about "inefficient" or "non-optimal". (Hedging level scientist.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"very over-approximate" or "too much of an over-approximation for some users".


### Example

Take the motivating example. Let us assume that we know for a fact that `call`
Copy link
Contributor

@xbauch xbauch Oct 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe: the motivating example above?

Copy link
Contributor

@xbauch xbauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Maybe squash a bit.

@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue force-pushed the feature/restrict-function-pointers branch 2 times, most recently from d9deab0 to 8534a11 Compare October 30, 2019 17:25
Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not opposed to this from a user-space point of view, nor the code in general. However there are lots of open issues to do with function pointer removal:

#4536
#4650
#2620
#1385
#915

(to mention just a few) and it would be good to try and get some consistency / order / plan here.

As far as I can see, the common theme is to separate the code into two parts:

A. things that get candidate sets of targets for a function pointer : from a user, from the signature, from the signature plus some kind of 'is dirty' analysis for when functions have their address taken, from the expression and the symbol table, from the abstract interpreter, etc.

B. things that use these candidate sets : for transforming the program, for doing AI with the function pointers still in, etc.

Adding this interface / separation should not be too hard.

}
```

This works well enough for simple cases. However, this is a "dumb" replacement
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"very over-approximate" or "too much of an over-approximation for some users".

```

This works well enough for simple cases. However, this is a "dumb" replacement
only based on the signature of the function, so if there are many functions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC there also used to be a global analysis of "which functions have their addresses taken".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-cs I believe this is still in place actually, so maybe I should update this to be more accurate? On the other hand, I don't really want to replicate the documentation of remove-function-pointers (if it exists) here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be accurate and it should be consolidated with any other documentation for this. I don't think there is any but do have a grep through the manual at least.

return x;
}

// this is just here to distinguish the behavior from FP removal, which'd include
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??? I'm not sure what this sentence is saying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because I didn't finish the sentence - this comes back to how FP removal works now, which is to scan for

  1. matching functions
  2. which have their addresses taken

so I'm adding this so it looks to FP removal as if an int(*)(int) could also pint to g rather than just to f.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes a little more sense.

}

template <typename GotoFunctionT, typename HandlerT>
void for_each_goto_location(GotoFunctionT &&goto_function, HandlerT handle)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against these but please note the amount of hatred that forall_goto_program_instructions et al. used to get and try to get some kind of agreement on whether we should be doing more or less of this.

@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue force-pushed the feature/restrict-function-pointers branch from a3ebc2c to 5040bc6 Compare November 5, 2019 10:52
This adds a new option, --restrict-function-pointer, to cbmc. This lets a user
specify a list of possible pointer targets for specific function pointer call
sites, rather than have remove_function_pointers guess possible values. The
intended purpose behind this is to prevent excessive symex time wasted on
exploring paths the user knows the program can never actually take.
@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue force-pushed the feature/restrict-function-pointers branch from e23741f to 860117d Compare November 5, 2019 15:53
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 860117d).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/135089826

@kroening
Copy link
Member

kroening commented Nov 8, 2019

A use-case question: This feature appears to be something that requires a specialist user, and I therefore don't expect wide usage. How about adding this to goto-instrument, to avoid adding complexity to CBMC's interface?
(Independently, CBMC's symbolic execution ought to learn how to do function pointers, but that is outside of the scope here).

@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@kroening I don't particularly mind; However the goal was for these mappings to be generated by something like e.g. goto-analyzer most of the time, so they hopefully shouldn't require too much expertise to use (hand writing these may be useful in some circumstances as well, but that wasn't the main point).

@karkhaz karkhaz self-assigned this Feb 12, 2020
@karkhaz karkhaz added the aws Bugs or features of importance to AWS CBMC users label Feb 12, 2020
@karkhaz
Copy link
Collaborator

karkhaz commented Feb 12, 2020

We don't mind having this in goto-instrument at all. In general, we don't expect to be generating anything: often we have a good idea of the function that we want to restrict to.

Thank you very much for this work, it looks great, and especially thank you for the documentation and test cases. Here are a couple of cases I'm not sure about.

#include <assert.h>

typedef int (*fptr_t)(int);

void call(fptr_t fptr)
{
  assert(fptr(10) == 10);
}

int f(int x)
{
  return x;
}
int g(int x)
{
  return x + 1;
}
int h(int x)
{
  return x + 2;
}

int main(void)
{
  call(nondet_int() ? f : g);
}
> cbmc test.c --restrict-function-pointer call.function_pointer_call.1/f
** Results:
test.c function call
[call.assertion.2] line 7 assertion fptr(10) == 10: SUCCESS
[call.assertion.1] line 7 dereferenced function pointer at call.function_pointer_call.1 must be f: FAILURE

☝️is that expected? I'm restricting to f, and f has the right type and potentially could be the right function, so why is it failing?

Also, if I change the command line to restrict to h rather than f (which really is an impossible candidate, since we're only choosing between f and g in main), I get an almost identical error message:

[call.assertion.1] line 7 dereferenced function pointer at call.function_pointer_call.1 must be h: FAILURE

Would it be possible for this to be distinct, to make it clear that this is the case of a user error passing in an impossible function?

@karkhaz karkhaz removed their assignment Feb 12, 2020
@hannes-steffenhagen-diffblue
Copy link
Contributor Author

@karkhaz thanks for having a look at this. The assertion

dereferenced function pointer at call.function_pointer_call.1 must be f

is intended to mean “the function pointer we got passed in was, indeed, a pointer to f”. This is a safeguard to prevent accidental misuse, so for instance in the example you quoted the function could also be g. The option generates code like this:

if(fptr == &f) { f(args...); }
else if...
else { 
  assert(false); // we assume the set of restrictions is a superset of the set of actually possible targets
  assume(false); // if this assumption is wrong we can’t guarantee anything about the following code
}

The assume(false) is necessary because if the restriction of function pointers was incorrect, then we can’t say anything meaningful about that case anymore. The assertion is just there for documentation.

In that regard it is really no different than how the already existing remove_function_pointers works, this is intended to mirror that behaviour except that there is user control over the restriction.

@danpoe danpoe force-pushed the feature/restrict-function-pointers branch from 860117d to 031dd8e Compare March 4, 2020 17:49
@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue force-pushed the feature/restrict-function-pointers branch from 031dd8e to 860117d Compare March 4, 2020 18:28
@codecov-io
Copy link

codecov-io commented Mar 4, 2020

Codecov Report

Merging #5168 into develop will increase coverage by 0.02%.
The diff coverage is 80.29%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5168      +/-   ##
===========================================
+ Coverage    67.26%   67.28%   +0.02%     
===========================================
  Files         1155     1157       +2     
  Lines        94476    94686     +210     
===========================================
+ Hits         63545    63713     +168     
- Misses       30931    30973      +42     
Flag Coverage Δ
#cproversmt2 42.76% <80.29%> (+0.11%) ⬆️
#regression 63.77% <80.29%> (+0.03%) ⬆️
#unit 31.91% <9.35%> (-0.06%) ⬇️
Impacted Files Coverage Δ
src/cbmc/cbmc_parse_options.h 100.00% <ø> (ø)
src/goto-programs/restrict_function_pointers.cpp 75.46% <75.46%> (ø)
src/cbmc/cbmc_parse_options.cpp 75.30% <100.00%> (+0.24%) ⬆️
src/goto-programs/goto_program.h 91.87% <100.00%> (+0.16%) ⬆️
...oto-programs/label_function_pointer_call_sites.cpp 100.00% <100.00%> (ø)
src/util/irep.h 95.52% <100.00%> (+0.13%) ⬆️
src/util/ieee_float.cpp 88.49% <0.00%> (-0.33%) ⬇️
...code/generic_parameter_specialization_map_keys.cpp 100.00% <0.00%> (ø)
...accelerate/disjunctive_polynomial_acceleration.cpp 0.00% <0.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a70b13...860117d. Read the comment docs.

@danpoe danpoe force-pushed the feature/restrict-function-pointers branch from 860117d to fc37b79 Compare March 12, 2020 13:19
@hannes-steffenhagen-diffblue hannes-steffenhagen-diffblue force-pushed the feature/restrict-function-pointers branch from fc37b79 to 860117d Compare March 12, 2020 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants