Skip to content

Ensure locations are updated after inlining #6076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 18, 2021

Conversation

TGWDB
Copy link
Contributor

@TGWDB TGWDB commented May 4, 2021

This commit ensures the locations in the goto program are
updated during the partial inlining process. This is due to
some paths (in goto-analyzer) checking for this part way through
various goto program transformations.

Fixes issue #6065

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@TGWDB
Copy link
Contributor Author

TGWDB commented May 4, 2021

Note that regressions tests are NOT included (yet) as this is urgent/blocking for AWS.

Copy link
Contributor

@thomasspriggs thomasspriggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a regression test adding before we merge it.

@NlightNFotis
Copy link
Contributor

Yes, I also favour regression tests being in place before any merge.

As it stands, this looks like a random code change, with no way for me (as a reviewer) to verify that this fixes anything.

@codecov
Copy link

codecov bot commented May 4, 2021

Codecov Report

Merging #6076 (c065f64) into develop (7663a57) will increase coverage by 1.12%.
The diff coverage is 57.77%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #6076      +/-   ##
===========================================
+ Coverage    74.30%   75.42%   +1.12%     
===========================================
  Files         1444     1447       +3     
  Lines       157453   158094     +641     
===========================================
+ Hits        116995   119249    +2254     
+ Misses       40458    38845    -1613     
Impacted Files Coverage Δ
src/solvers/smt2/smt2_conv.h 100.00% <ø> (ø)
src/solvers/smt2/smt2_conv.cpp 60.63% <35.71%> (+0.25%) ⬆️
src/solvers/smt2/smt2_solver.cpp 83.11% <65.51%> (ø)
src/goto-programs/instrument_preconditions.cpp 91.66% <100.00%> (+0.11%) ⬆️
src/solvers/prop/prop.cpp 42.85% <0.00%> (-42.86%) ⬇️
src/util/pointer_expr.h 77.71% <0.00%> (-22.29%) ⬇️
src/util/pointer_expr.cpp 72.82% <0.00%> (-18.96%) ⬇️
src/goto-programs/read_goto_binary.cpp 30.84% <0.00%> (-16.83%) ⬇️
src/goto-programs/goto_program.cpp 53.05% <0.00%> (-13.56%) ⬇️
src/ansi-c/type2name.cpp 77.65% <0.00%> (-12.30%) ⬇️
... and 730 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4889328...c065f64. Read the comment docs.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wouldn't be surprised if this line is the correct fix, failing to update location numbers after modifying goto-programs is a classic bug that many of us have made. Particularly when duplicating code.

However I am less convinced that this is currently in the right place. It seems that it is before the inlining changes things. If this is the right place, please explain why in the commit message.

@thomasspriggs
Copy link
Contributor

I really wouldn't be surprised if this line is the correct fix, failing to update location numbers after modifying goto-programs is a classic bug that many of us have made. Particularly when duplicating code.

However I am less convinced that this is currently in the right place. It seems that it is before the inlining changes things. If this is the right place, please explain why in the commit message.

I worked with TGWDB to help find the cause of the issue. There were definitely location numbers which were left uninitialised / zero valued before the inlining. I'd need to check to confirm whether or not additional bad locations were introduced during the inling process as well.

@TGWDB
Copy link
Contributor Author

TGWDB commented May 5, 2021

I really wouldn't be surprised if this line is the correct fix, failing to update location numbers after modifying goto-programs is a classic bug that many of us have made. Particularly when duplicating code.
However I am less convinced that this is currently in the right place. It seems that it is before the inlining changes things. If this is the right place, please explain why in the commit message.

I worked with TGWDB to help find the cause of the issue. There were definitely location numbers which were left uninitialised / zero valued before the inlining. I'd need to check to confirm whether or not additional bad locations were introduced during the inling process as well.

I've been looking this morning (not updating each little bit). The problem with the locations in the example is occurring inside the instrument preconditions code (instrument_preconditions.cpp). Shifting the fix in the first commit (later commits to update this PR no doubt) to before instrument preconditions has the error, after the instrument preconditions prevents the error. I have not yet identified the exact conditions to minimise the example and also identify which part of the code is the culprit.

Note that this strongly implies the error is inside the instrument preconditions code and there is no indication of an error in the inlining code.

@TGWDB
Copy link
Contributor Author

TGWDB commented May 5, 2021

Further update. During the unifying of the parsing options and goto code paths between cbmc, goto-diff, and goto-analyzer the instrument_preconditions function was added to the goto-analyzer behaviour. Previously instrument_preconditions was never called from goto-analyzer and so this is a new additional behaviour. The manipulations inside instrument_preconditions can add in locations that are not in correct order (and these can appear in cbmc and goto-diff) as well as in goto-analyzer. However, as a safety check (precondition) after partial function inlining goto-analyzer checks the locations are in correct order. Since the instrument_preconditions can violate this, goto-analyser now has an error due to new behaviour combined with an old check.

To summarise:

  • goto-analyser has had instrument_preconditions added, which can create instructions with incorrect location order
  • goto-analyser has a check for location order in the partial inlining code.
  • Together these caused a new invariant violation.

With the above in mind, I recommend doing at least:

goto_model.goto_functions.compute_location_numbers();

in process_goto_program.cpp after instrument_preconditions(goto_model);. This will fix/update the locations (and put them in order), thus resolving this problem while leaving the checking in after the partial inlining.

TODO: also a small regression test.

@martin-cs
Copy link
Collaborator

martin-cs commented May 6, 2021 via email

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to keep on about this but fixing this at the root cause is important.

// inlining can fail if locations are not in correct order.
// This refreshes the locations before inlining so any failures
// later are correctly attributed to errors in the inlining.
goto_model.goto_functions.compute_location_numbers();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be in instrument_preconditions because "update was called after modification" should be an invariant of goto_programt's and things that work on them.

In an ideal world we could add this to the validate checks and call them as post-condition of all of these transforms.

@TGWDB
Copy link
Contributor Author

TGWDB commented May 11, 2021

This is an update that goes with a commit that may come before or after this message. The commit moves the updating of the goto_model.gotofunctions inside the instrument_preconditions function. This should address the concerns about putting the validation/fixing of the code in the right place.

The main point for this message is to discuss the regression test (or lack of) and how to finish this PR. I've been refining the example that caused this originally but have not narrowed this down to anything suitable small and reasonably to turn into a regression test. Below are the (apparent) requirements to cause the invariant violation.

  1. There need to be two goto files that are combined into a single goto file. Attempts to recreate the problem with only a single goto file or source file have been unsuccessful.
  2. The --export-file-local-symbols flag must be used to create both goto files.

Unfortunately the example relies on openssl includes, and there is a LOT of complexity in sorting out the details of what is necessary and what isn't. So far shrinking the example has gone down to two files, one is 3 lines, the other is about 85 lines, but still includes openssl.

At this point there are five (all undesirable) options, which I'd like to have feedback on. (Presented in no particular order.)

  1. Have PR with no regression test(s). This is clearly undesirable, but would be easy to approve and merge immediately (which is important to AWS).
  2. Have a regression test that uses prebuilt goto files. This is undesirable for a couple of reasons. A: These goto binaries will inevitably go out of date in the future and the test will fail, hence losing the value of the test. B: The creation of these gotos may have the root cause of the error, and so the test will not catch the fix.
  3. Have a regression test that relies upon having openssl libraries available. Again this is undesirable since this forces dependencies and other installs for regression tests and all the complexities that follow.
  4. Continue to explore and try to find the exact interplay of options, files, dependencies, cbmc code, etc. that leads to the bug to either find the precise bug fix or be able to build a good regression test (which ever comes first). This is undesirable since it is complex and can be a (further) time sink of non-trivial proportions.
  5. Revert the behaviour to not have goto-analyser call the instrument_preconditions function. This is undesirable since it again does not really fix or address the bug, but simply reverts to the behaviour before the error. Also this continues the separate code paths for different programs that Refactor process_goto_program so that all tools use common processing code #5807 aimed to fix (although this kind of holds true for the partial inlining option since only goto-analyzer can invoke it).

Feedback welcome!

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not 100% convinced this is the right patch but I think it is close to fixing /a/ problem. I would not block this patch.

@@ -135,10 +135,15 @@ void instrument_preconditions(goto_modelt &goto_model)
instrument_preconditions(
goto_model,
f_it.second.body);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you move the addition and call to update to here?

Copy link
Contributor

@thomasspriggs thomasspriggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is still missing a regression test. Message me and we can work together on reducing the example down, if it is still causing problems.

@@ -203,7 +203,6 @@ void goto_partial_inline(
}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ This empty line removal looks unrelated to the changes this commit is supposed to be making. Presumably this is due to adding and removing different changes in this PR, but this formatting change should be cleaned up. Same applied to the line marked with a 💨

@@ -135,10 +135,15 @@ void instrument_preconditions(goto_modelt &goto_model)
instrument_preconditions(
goto_model,
f_it.second.body);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💨 See previous comment.

@martin-cs
Copy link
Collaborator

@TGWDB thank you so much for chasing this one. I think the changes you have made are already a lot better but I am not quite sure we are at the end of the journey yet.

Thoughts on the options.

  1. Undesirable but possible. My concern is less that there isn't a regression test and more that we don't know for sure what causes this.
  2. We have contemplated this before but it is a big ask and "haven't understood the bug enough" is a bad reason to add one.
  3. Agreed; the a goal is to keep CPROVER dependency light. This is not a good enough reason to add a dependency like that.
  4. This is technically the correct and desirable option. I guess the question is how simple / fast / cheap we can make this.
  5. I'd really rather not.

How does the following sound for a (falsifiable) conjecture...?

  1. The problem does not exist before instrument_precondition is called. Check by commenting it out and seeing if that crashes.
  2. The problem doesn't show up in other tools (at the moment...) is that they call another instrumentation pass which does call update before the first pass that actually uses the location numbers (i.e. goto_partial_inline in the case of goto_analyzer). The bug exists but is masked (which is why common preprocessing is a good goal!). This should be possible to test by enabling goto_partial_inline in CBMC and disabling any other instrumentation passes that call update inbetween the two.
  3. // add before the call, with location of the call
    adds an instruction to a goto_programt without calling update, which is definitely a bug. This can be tested by modifying the code to output the location number or to look at the crash and check it is using one of the added instructions.
  4. Two files are needed because if the precondition and the call site are in the same compilation unit this processing is done earlier? This can be tested by ... asking @tautschnig ? Looking at the code of instrument_precondition should give some idea of what triggers the addition.
  5. OpenSSL is needed because we don't have any tests with preconditions that have preconditions and their usage in different files. Testing this should be possible by isolating a function with a precondition and it's call into two separate files (or whatever other conditions are needed from 4).

If this lot is correct or correct enough then I think that gives you a route to a regression test and to option 4.

@martin-cs
Copy link
Collaborator

Also @TGWDB have you come across https://github.com/csmith-project/creduce ?

@NlightNFotis
Copy link
Contributor

NlightNFotis commented May 12, 2021

@thomasspriggs The example is very hard to be reduced further - I checked it out with Thomas G. yesterday.

The reason for that is that the problematic behaviour is manifesting on a reduced file that includes an openssl header file that is kind of an all encompassing include - meaning it transitively includes a huge number of other headers. If we remove that header file, the problem disappears. It's unclear which one of the transitively included headers is making the error manifest, and detecting/isolating that looks like is a significant amount of effort.

@NlightNFotis
Copy link
Contributor

@martin-cs This is a great idea, I didn't think of c-reduce. I don't know if Thomas tried it before, but this looks like it might be an option worth trying to see if we can come up with a regression test.

@TGWDB
Copy link
Contributor Author

TGWDB commented May 13, 2021

Quick response to update, I'll add more later today/tomorrow.

@TGWDB thank you so much for chasing this one. I think the changes you have made are already a lot better but I am not quite sure we are at the end of the journey yet.

snip

How does the following sound for a (falsifiable) conjecture...?

  1. The problem does not exist before instrument_precondition is called. Check by commenting it out and seeing if that crashes.

This is true and tested before.

  1. The problem doesn't show up in other tools (at the moment...) is that they call another instrumentation pass which does call update before the first pass that actually uses the location numbers (i.e. goto_partial_inline in the case of goto_analyzer). The bug exists but is masked (which is why common preprocessing is a good goal!). This should be possible to test by enabling goto_partial_inline in CBMC and disabling any other instrumentation passes that call update inbetween the two.

This is also true and tested before.

  1. // add before the call, with location of the call

    adds an instruction to a goto_programt without calling update, which is definitely a bug. This can be tested by modifying the code to output the location number or to look at the crash and check it is using one of the added instructions.

I'll look into this further (it's part of what I was doing when I wrote above).

  1. Two files are needed because if the precondition and the call site are in the same compilation unit this processing is done earlier? This can be tested by ... asking @tautschnig ? Looking at the code of instrument_precondition should give some idea of what triggers the addition.
  2. OpenSSL is needed because we don't have any tests with preconditions that have preconditions and their usage in different files. Testing this should be possible by isolating a function with a precondition and it's call into two separate files (or whatever other conditions are needed from 4).

I strongly suspect we can do it without openssl, eventually.

If this lot is correct or correct enough then I think that gives you a route to a regression test and to option 4.

Agree, it's the path I'm on.

@TGWDB
Copy link
Contributor Author

TGWDB commented May 13, 2021

Also @TGWDB have you come across https://github.com/csmith-project/creduce ?

I tried cpp-merge but not this, I'll have a look. Thanks!

@TGWDB TGWDB force-pushed the inlining-bug-6065 branch from 4d4f416 to 35afe57 Compare May 14, 2021 08:07
…entation

This commit ensures the goto_model.goto_functions are
updated during the instrument_preconditions function.
This is due to some of these leaving locations uninitialized
and this causing later problems with partial inlining (that
can only occur in goto-analyzer).

Fixes issue diffblue#6065
@TGWDB TGWDB force-pushed the inlining-bug-6065 branch 3 times, most recently from d6e5e94 to 570de31 Compare May 14, 2021 12:43
@TGWDB TGWDB force-pushed the inlining-bug-6065 branch from 570de31 to 03ad7d9 Compare May 14, 2021 12:49
This changes the chain.sh (and adds a READFME.md to explain)
so that a test can be written that uses its own script to build
the goto binary.
@TGWDB TGWDB force-pushed the inlining-bug-6065 branch from 45dac8f to 3f76009 Compare May 14, 2021 15:53
Copy link
Contributor

@thomasspriggs thomasspriggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@TGWDB
Copy link
Contributor Author

TGWDB commented May 17, 2021

@martin-cs Can you please re-review this?

Copy link
Contributor

@NlightNFotis NlightNFotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for chasing this down and sorry it has turned out to be more of an ordeal than it first looked.

Comments are mostly thoughts / suggestions but one last question -- what triggers the instrument_preconditions if there are no preconditions in the source code?

};
struct s2n_evp_digest
{
struct evp_md_ctx_st *ctx;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this might be simplified a bit more, can this be replaced with the const void * field directly?

Equally well; if you are done with reducing this, that is also fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

test.c
--verify
Checking assertions
^EXIT=0$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check for at least one of the assertions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested for the first verification result.

--
Invariant check failed
--
This test checks that after building the goto binary (see test.sh)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant!

// for later passes.
// Note that only the first loop is the one known to leave locations
// uninitialized.
goto_model.goto_functions.update();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

This adds a regression test that checks to ensure that goto-analyser
reaches the "Checking assertions" phase of it's execution and also
one of the checks was SUCCESFUL. This also fails on any invariant
violation.
@TGWDB TGWDB force-pushed the inlining-bug-6065 branch from 3f76009 to fa9170c Compare May 18, 2021 09:01
@TGWDB
Copy link
Contributor Author

TGWDB commented May 18, 2021

Thank you for chasing this down and sorry it has turned out to be more of an ordeal than it first looked.

Comments are mostly thoughts / suggestions but one last question -- what triggers the instrument_preconditions if there are no preconditions in the source code?

Line 52 of process_goto_program.cpp always invokes instrument_preconditions regardless of anything else. I cannot comment on why this decision was made originally (or why this was added to goto-analyzer.

@martin-cs
Copy link
Collaborator

Apologies; I wasn't clear. instrument_preconditions is always run in an attempt to make the preprocessing of goto programs common between all tools (this is a work-in-progress).

My confusion was that

goto_program.insert_before_swap(it);
should only modify the program ( and thus create the uninitialised locations ) if there are preconditions. And the test case doesn't have any. I feel like there is something I am not getting and I don't know if it is significant.

@TGWDB
Copy link
Contributor Author

TGWDB commented May 18, 2021

@peterschrammel @tautschnig Any chance of a codeowner review for this?

@@ -0,0 +1,44 @@
struct evp_md_ctx_st
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ What's the license of this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For s2n_hash_inlined.c lines 1-4 are adapted from openssl which has an Apache like licence and lines 5-44 are modified from the AWS s2n library that is also under Apache 2.0 licence. All of simple.c is modified from the AWS s2n library.

Copy link
Member

@peterschrammel peterschrammel May 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a header and comments to explain this so that we know where this is coming from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@martin-cs
Copy link
Collaborator

To answer my own question : it might well be free that is triggering this. In which case it should be possible to make the example a lot simpler.

This adds information on the origin of the regression test.
@martin-cs
Copy link
Collaborator

martin-cs commented May 18, 2021

I still felt like I didn't understand why this bug was occurring so I started having a dig.

s2n_hash_inlined.c can be reduced to:

void *p;
void *q;

void my_free(void)
{
  // Need both frees
  free(p);
  free(q);
}

typedef void (*free_function)(void);

free_function global = &my_free;

void point(free_function *f)
{
  *f = global;
}

Doing this removes the need for the --export-file-local-symbols flags.

This reduces the possible cause of the issue but I am still pretty hazy on the exact sequence of events that cause it. I /speculate/ that there is something to do with a pointer to my_free possibly escaping and so needing to be processed later? Is it that the preconditions of free are being inlined into my_free and then need inlining again?

@TGWDB TGWDB merged commit cdffe9a into diffblue:develop May 18, 2021
@TGWDB
Copy link
Contributor Author

TGWDB commented May 18, 2021

I still felt like I didn't understand why this bug was occurring so I started having a dig.

s2n_hash_inlined.c can be reduced to:

void *p;
void *q;

void my_free(void)
{
  // Need both frees
  free(p);
  free(q);
}

typedef void (*free_function)(void);

free_function global = &my_free;

void point(free_function *f)
{
  *f = global;
}

Doing this removes the need for the --export-file-local-symbols flags.

This reduces the possible cause of the issue but I am still pretty hazy on the exact sequence of events that cause it. I /speculate/ that there is something to do with a pointer to my_free possibly escaping and so needing to be processed later? Is it that the preconditions of free are being inlined into my_free and then need inlining again?

Thanks for digging deeper into this. I merged since the invariant violation was a blocking issue for AWS.

I'm not sure if the "inlining" is a distraction here since the problems appear before the cbmc inlining code itself (unless you meant inlining somewhere else?).

@martin-cs
Copy link
Collaborator

Inlining the precondition.

I think there is another bug which is a difference in how the preprocessing is done and possibly when / how the C library is linked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants