Skip to content

Mini projects list update #4119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
317 changes: 294 additions & 23 deletions MINI-PROJECTS.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,11 @@
CBMC Mini Projects
==================
# CBMC Mini Project Ideas

The following projects are short, focussed features that give new CBMC
contributors an introduction to one part of the codebase. If you're
interested in contributing to CBMC, feel free to start with any of these
projects!


Task List
---------

### `__CPROVER_print`
## `__CPROVER_print`

**Task**: Implement a CPROVER intrinsic that assigns an expression to a
dummy value, to help with debugging and tracing.
Expand Down Expand Up @@ -45,8 +40,7 @@ manual ghost assignment in their code.
**Hints**: Figure out how `__CPROVER_assume` and `__CPROVER_assert` work.
Then figure out how to add new instructions into a goto-program.


### Annotation to constrain function pointers
## Annotation to constrain function pointers

**Task**: implement a CPROVER intrinsic that tells CBMC what function a
function pointer points to.
Expand All @@ -58,8 +52,7 @@ could be called. This results in a combinatorial explosion of paths when
CBMC is exploring the program. It would be useful to have an annotation
that tells CBMC exactly what function a pointer is expected to point to.


### Generalize loop-unwinding bounds to cover multiple loops
## Generalize loop-unwinding bounds to cover multiple loops

**Task**: allow users to specify a *combined* bound for the sum of
several loop counters.
Expand All @@ -76,42 +69,320 @@ specify that the *combined* loop bound for the three loops is `S`.

The current syntax for the `--unwindset` switch is

```
```sh
--unwindset LABEL:BOUND
```

You might like to generalize this so that it looks like

```
```sh
--unwindset LABEL<,LABEL2,LABEL3,...>:BOUND
```

which would have the effect that the loops with those labels would all
share a bound.

*Hints*:
- The current member that stores how many times a loop has been unwound

* The current member that stores how many times a loop has been unwound
is `goto_symex_statet::loop_iterations`. Have a look at where this
member is accessed, and what is done with that value (e.g. read
through `symex_bmct::should_stop_unwind()` and
`symex_bmct::loop_unwind_handlert`).

- You may wish to add a map from a *set* of loop names to a loop bound,
* You may wish to add a map from a *set* of loop names to a loop bound,
in addition to the current map from single loop names to loop bounds.

- If a loop has an individual bound, and is also part of a set of
* If a loop has an individual bound, and is also part of a set of
mutually-bound loops, then we should stop unwinding it if *either* of
those bounds is reached. Good, thoughtful test cases are essential
here!

Completed
---------
## Instrumentation to Check Pointer Alignment

Some architectures only allow load instructions / pointer dereferencing when the
address is a multiple of the word length (i.e. 4 bytes or 8 bytes). This project
would add a new instrumentation option for `cbmc` and `goto-instrument` that checks
each pointer before dereference and makes sure it's correctly aligned.

This is a good way to get into `goto_check` and how instrumentation of asserts
works.

## Command line options to SAT solvers

There are options for which SMT solver to run and there are ways of building with
multiple SAT solvers but no command line options to pick a SAT solver. It would be
nice if there were.

## Combine Uninitialized Variables Analysis with Alias Analysis

`goto-instrument` features an instrumentation to track cases of use-before-define,
i.e. reading of uninitialised variables. At present, the analysis does not take
into account aliasing information. As such, the following code will yield a spurious
counterexample:

``` C
int main()
{
int x;
int *ptr;
ptr=&x;
*ptr=42;
if(x < 42)
return 1;
return 0;
}
```

This should be pretty straight-forward to do with `goto-analyzer --vsd` and might even
be a good intro to that code.

## Make SAMATE Tests Work

Read [this paper](https://www.sciencedirect.com/science/article/abs/pii/S0950584913000384) and
make the tests mentioned in there work.

## Error Explanation

Re-implement the algorithm described in [this paper](http://www.kroening.com/papers/STTT-error-explanation-2005.pdf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is partly implemented in the fault localizer.


## LTL

Build an instrumenter in `goto-instrument` for checking LTL (or even CTL) properties
on C code. E.g look at [this paper](https://www.umsec.umn.edu/publications/Partial-Translation-Verification-Untrusted-Code)

Please get in touch with Michael Tautschnig, who has done very similar work in FShell.

## Test SMT Back Ends

CBMC allows to choose among a selection of SMT solvers to use as solving back end.
As these tools evolve, CBMC's code for running them may need a fresh look from time
to time. After a first round of manual experiments, these tests should become part
of the routine build process on `dkr-build`. (Michael Tautschnig knows how to do this.)

## Improved Interpreter

CProver has an interpreter for goto programs but it could be improved, for example adding
support for dynamic analysis, calls to external functions and possibly even mixed concrete
and symbolic execution. It would also be good to be able to interpret traces found by BMC
to catch mismatches / bugs between the logical model and the bit blasting models. This
would also be useful for more advanced CEGAR like techniques (also see below).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what the actual state of the interpreter is today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose a project to implement an interpreter using goto-symex.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the abstract interpreter would also be possible.


## "KLEE-PROVER"

Alter the CBMC symbolic execution engine to create a dynamic symbolic execution tool
similar to [KLEE](https://klee.github.io/). Possible features include:

* Distributed and incremental support using the caching scheme here
* A feature to run BMC as you go to explore the traces 'around' the execution traces
* Use path exploration as the witness generation part of an ACDL process.
* Perform precondition inference, possibly using ACDL.

## Interact with GDB

GDB implements a client-server protocol with a GDB instance acting as a server and the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an open PR on that one (#3983).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that covers all the points below.

GUI as a client. (The documentation for that protocol may be found [here](http://www.sourceware.org/gdb/current/onlinedocs/gdb/GDB_002fMI.html).)
By implementing this protocol in various places within CProver, there are various
things that could be achieved.

To do some of these will require working out which functions are non-deterministic models
of real functions and accounting for this. This would also be necessary for some of the
preceeding ideas.

### Trace as a GDB server

Implement a GDB server that uses a trace to 'execute' the program. This allows any GUI
debugger that can talk to GDB to be used to graphically explore the trace. Particularly
valuable for concurrent programs in which just reading the trace could be very hard.

An implementation of that is in the trace debugger here: https://github.com/diffblue/eclipse-cbmc

Handling the SV-COMP counter-example format (ask Michael) would probably make this tool
a lot more widely usable.

### Trace as a GDB client

Conversely, implement a GDB client that uses the trace to control a debugger. If
connected to a 'real' GDB then this would allow the debugger to be set to the position
that shows the error. In the case of parallel programs this avoids the need to create
custom schedulers, etc. and over-all it is a much better interface – rather than
generate a trace, CProver simply gives you the debugger in the state that triggers
the problem.

One option would be to convert the trace to a GDB script but this misses some of
the value as it would be possible to query they GDB server as the program runs to
check that values really are assigned correctly. This will catch the first point
at which the model and the real system diverge and allows checked 'resimulation'
of the traces on a 'real' platform. If the goto-binary contains the ELF binary
then this could be used directly.

### Interpreter as a GDB server

An enabling tech, this would allow us to run the interpreter as a debugger back
end. Combine with the trace as a GDB client, this would allow resimulation of
traces, which would be useful for catching encoding bugs in CProver and ideally
should be run after the decode from the solver all of the time.

Michael has a partial implementation of this. (?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, @tautschnig ?


### Recording GDB server

Acts as a GDB server and passes commands back to another GDB server to actually
execute. Records the state changes and locations visited. This allows a test case
to be 'run' and the trace / conditions on the trace to be collected and (after
generalisation) be used for test case driven analysis / test case generalisation
/ something “concolic” like.

### Differential GDB server

Acts as a GDB server and then connects (as a client) to one or more GDB servers.
Commands are executed on all of these and then the state / position compared
(much of the infrastructure of the previous idea could be used). Divergences of
behaviour can then be found, particularly root causes can be identified by the
first divergence. This may not be CProver specific but can likely be used for
various projects and ideas above.

### "Concolic" GDB client

An interactive component that runs a program via a GDB server and query / interact
with it. This would be useful for building a “concolic” tool, especially for running
library functions that are not defined.

## Other ideas

Valgrind and the Linux kernel both have GDB servers - what could we do with these?

### Static Detection of Data Races

It would be good to have a tool that checks for data races in a similar fashion to
DRD / Helgrind. The current pointer analysis can over approximate shared variables,
so possibly some combination of this, trace generation, DRD / Helgrind and then
refinement of which traces to pick? Perhaps this is even something ACDL like?

### CORAL / Q
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose to implement something like stratified inlining, for example.


Build something similar to the Q tool in SLAM. (?)

### k-liveness

Implement [k-liveness](http://www.cs.utexas.edu/~hunt/FMCAD/FMCAD12/013.pdf)
for software-liveness checking.

### Known Bugs on the Web

Turn the list of `KNOWNBUG`S from the CBMC regressions into a list of open issues
in the CBMC repository issue tracker (https://github.com/diffblue/cbmc/issues).

If you do this, make sure you target either the`develop` branch or the latest
CBMC release.

### Coding Standards Checker

Build instrumenters for various coding standards, e.g. AutoSar, [MISRA-C](https://en.wikipedia.org/wiki/MISRA_C),
CERT-C, [HIC++](https://www.perforce.com/resources/qac/high-integrity-cpp-coding-standard),
etc.

### Front-End Testing

Run the regressions mentioned [here](https://github.com/kframework/c-semantics/tree/master/tests)
and the [GCC C Torture tests](https://gcc.gnu.org/onlinedocs/gccint/C-Tests.html)
with `goto-cc` and a `goto-program` interpreter.

### Unspecified C Behaviours
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a subset of the Coding Standards Checker.


Build instrumentation to check for unspecified behaviours in C, as per Section J.2
(and maybe also J.3) of the C standard. Some of them may be possible to handle
entirely in the front-end (for example, catching `x = x++` as modifying a variable
twice between sequence points could be done in side effect removal in ansi-c), while
others may be best to be instrumentation (for example handling the unspecified nature
of the order of evaluation of function arguments).

For more information, talk to Michael Tautschnig for this item.

### Reporting Software Errors with CWE Information

The [Common Weakness Enumeration project](http://cwe.mitre.org/) seeks to classify
software weaknesses and errors in a common dictionary. Reporting CWE information together
with failed assertions, such as [Free of Pointer not at Start of Buffer](http://cwe.mitre.org/data/definitions/761.html),
would help both users of the tools as well as industrial partners. In the first instance,
errors already caught by present instrumentation should be augmented with additional
information. In a second step, instrumentation for other problems listed in the CWE
database should be developed.

### Machine Learn Refinement settings

The refinement option enables a tactic of under and over approximation to try to
solve the generated formulae more quickly. There are a large number of possible
options – which operations to approximate, how much to under or over approximate
them, how much to add back if this doesn't work, etc. Rather than trying to manually
pick the best strategy from these, it would be good to machine learn them.

### Cluster CBMC

Most users of CBMC have access to multi-core machines; many have clusters available.
There are a number of different approaches and a number of different levels at which
the task could be parallelised; per assertion, portfolio approaches of different
back-ends, parallel solvers, etc. It would be good to be able to exploit these.
The caching approach suggested under “KLEE-PROVER” might be an easy route to
implementing this.

### x86 to Goto

Look at various efforts to model x86 assembler, e.g., Andrew Kennedy,
“Formalizing .EXEs, .DLLs, and all that”, Gang Tan, “Reusable tools for formal
modeling of machine code”, and check whether this can turn into a translator
from x86 to goto binary.

Also look at Google's Native Client, and talk to Matt, who has tried Valgrind.

Furthermore, talk to Michael, who has got a student who made some progress
on this.

And maybe have a read of Automated synthesis of symbolic instruction encodings
from I/O samples.

### Branch Prediction

Implement branch prediction algorithms in path-symex.

### GCC Torture Tests

GCC has a set of 'torture tests' which are intended to stress test and find bugs in their
parser and front-end. It would be good to run these through goto-cc / add them to the
nightly testing and fix any bugs that arise. (Possible overlap with “Front-End Testing”
above.). The same is probably also true for other compiler fuzzing tools.

### Compiler loop optimisations for BMC

Try out some of the compiler loop optimisations (hoisting invariant code,
fusing sequential independent loops, etc.) and see if they improve BMC,
K-Induction or loop acceleration. These should be possible to implement in
goto-instrument.

### Handling restrict in a smart way

C99 adds the `restrict` keyword to mark unaliased pointers. This can be discarded by
the compiler and thus has no semantics, it is only an annotation. However, there are
a number of thing we could do with:

1. Insert checks in calling functions to make sure that the pointers really are unaliased.
2. Add an instrumentation pass that adds assumptions that restricted pointers are not
aliased. This may imporve the quality of counter-examples (although at the cost of
potentially missing some bugs, although not if combined with the previous technique).

### C++ Exceptions

C++ has annotations for what exceptions a function can throw. It would be good to
be able to check / generate / minimise these and it should be possible to do this
with a reasonably simple abstract domain. The CVC4 project would be interested in
using this if it exists.

### Add a `--print-internal-representation` switch
An old prototype of something similar is here: https://github.com/peterschrammel/cbmc/commits/progres-exceptions

*Summary*: This switch should provide more detail than
`--show-goto-functions`, by printing out the `irep` of every program
instruction instead of a concise representation.
### Java Memory Model

*Completed in* PR #991
CBMC's Java front-end is currently in development and will support the full Java
bytecode instruction set soon. Beyond that, it would be great to support some more
advanced aspects of the Java programming language semantics, such as the Java Memory
Model. This will include adding models for a subset of the Java concurrency packages
in the Java Runtime Library.