This is the Pointer Ownership Model (POM) project. It consists of a temporal memory-safety model for C code, a p-model generator that specifies (in a YAML file) the application of POM to a particular program, and an automated verifier that checks whether the program (and if input, the p-model(s)) satisfies the POM constraints.
Pointer Ownership Model (POM) Source Code ReleaseCopyright 2025 Carnegie Mellon University.
NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING
INSTITUTE MATERIAL IS FURNISHED ON AN "AS-IS" BASIS. CARNEGIE MELLON
UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR
IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF
FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS
OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT
MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT,
TRADEMARK, OR COPYRIGHT INFRINGEMENT.
Licensed under a MIT (SEI)-style license, please see license.txt or
contact [email protected] for full terms.
[DISTRIBUTION STATEMENT A] This material has been approved for public
release and unlimited distribution. Please see Copyright notice for
non-US Government use and distribution.
DM25-1262
The code is designed to run inside a Docker container, which means you will need Docker. In particular, once you can run containers, you will need to share filesystems between your host and the various containers. For that you want the -v switch in your docker run command; its documentation is available here: https://docs.docker.com/reference/cli/docker/container/run/#volume
To build the Docker container:
docker build -f Dockerfile -t docker.cc.cert.org/pom/pointer-ownership-model:latest .For SEI users, we provide a useful container image for working with POM. It is available via the SEI IT Artifactory: docker.cc.cert.org/pom/pointer-ownership-model:latest.
The image contains all the dependencies needed to run the repair tool, but it does not actually contain the repair tool itself. This container is useful if you wish to debug or extend the tool itself, as you can edit files on your host and immediately access them in the container.
To pull the image:
docker pull docker.cc.cert.org/pom/pointer-ownership-model:latestIn the examples below, we assume you have created a directory, in which you will put your codebases to analyze for temporal memory safety by building p-models for the codebase and then validating the p-model against the codebase.
Your filepath to that directory should be substituted for <FILEPATH_TO_CODEBASES_DIR>.
To start a Bash shell in the container:
docker run -it --rm -v ${PWD}:/host -v <FILEPATH_TO_CODEBASES_DIR>:/datasets -w /host docker.cc.cert.org/pom/pointer-ownership-model:latest bashOur p-model builder script interacts with OpenAI's API. Ubuntu and other Linux users may choose to use the secret-tool to store their OpenAI API Key. (e.g., secret-tool store --label="OPENAI_API_KEY" service openai key api_key and then provide the key at the secret-tool's prompt). To pass that environment variable at Docker run time, those users can start the container as follows:
docker run -it --rm -e OPENAI_API_KEY="$(secret-tool lookup service openai key api_key)" -v ${PWD}:/host -v <FILEPATH_TO_CODEBASES_DIR>:/datasets -w /host docker.cc.cert.org/pom/pointer-ownership-model:latest bashWe are using the dos2unix codebase, version 7.5.2. It is freely available from dos2unix.sourceforge.io/. For this demo, you should download dos2unix-7.5.2, unpack it, and place its contents in a dos2unix-7.5.2 sub-directory that is available to the POM container (e.g., mv dos2unix-7.5.2 <FILEPATH_TO_CODEBASES_DIR>). You could use this command to unpack, which creates the required dos2unix-7.5.2 sub-directory:
tar xzf dos2unix-7.5.2.tar.gzInput to the p-model builder includes the source code and output from the Clang compiler tool. The Clang output needed is an abstract syntax tree (AST) and an intermediate representation (IR) files (LLVM IR files are .ll files).
There are two methods to create a p-model file: LLM-based or SAT-solver-based. The SAT-solver method creates a p-model after verifying that the program satisfies POM constraints, and cannot create a p-model for the program if it doesn't. The LLM-based method can create a p-model for the program, regardless of a SAT-solver's determination. If you only use an LLM to generate a p-model, you don’t know if the code is compliant or if the LLM made a mistake.
The following command, when run in the POM container, creates compilation database file (named compile_commands.json) for dos2unix:
cd /datasets/dos2unix-7.5.2
bear -- make CC=clang dos2unix
make clean
mv compile_commands.json ..
cd ..(For this example, we are ignoring the unix2dos program which is also part of Dos2Unix.)
The next step is to run the make_run_clang.py script. This produces a shell script for running Clang on the source code. You must run this shellscript explicitly, as the next step.
This step requires the compile_commands.json file produced in the previous step.
cd dos2unix-7.5.2
mkdir /host/d2u # already exists, if you ran the Docker container command above, mounting the volume
python3 /host/src/make_run_clang.py -c /datasets/compile_commands.json --output-clang-script /host/d2u/run_clang.shNext, the Clang shellscript should be run. This script takes one argument: a temporary directory that contains the AST(s) and .ll files generated by Clang.
This step requires the run_clang.sh file from the previous step.
bash /host/d2u/run_clang.sh /host/d2uYou now have the AST and source code. With those as input, you can run the LLM-based Builder. To do this, you must have the OPENAI_API_KEY environment variable set up as per OpenAI's instructions for accessing GPT from the command line, and your machine must be able to access GPT. (For an example of how provide your OPENAI_API_KEY to the container, see docker run instruction including secret-tool lookup in the Build instructions) section.
If you have more than 3 files that match /host/d2u/*.ast.json.gz, delete all but the newest 3.
cd /host/src
python3 build.py --verbose /host/src/dos2unix.llm.pom.yml /datasets/dos2unix-7.5.2 /host/d2u/*.ast.json.gzThis will produce a /host/src/dos2unix.llm.pom.yml, the p-model constructed by GPT.
The script props_to_pom_yaml.py (in directory src/constraint_gen/) uses the SAT-solver output solution JSON-formatted file to generate a p-model.
The SAT-builder method can be used if the program has only one function or if the SAT-solver is used in whole-program mode.
In whole-program mode, the SAT solver still needs p-models for the standard library functions.
You can create and save p-models for standard library functions. You can do this using the SAT-builder method or using the LLM-based method provided in Section Undefined Function p-models. We provide p-models for some library functions in src/test/test.famous.llm.pom.yml (e.g., free, strlen, strcmp). What the SAT solver needs for library functions is their index numbers and their names. If you have a program that calls library functions, then you should provide a p-model for the library and also provide an argument index ID for each function argument. (See ./d2u/dos2unix.famous.sorted.pom.yml for examples with field arg_idx providing the argument index ID. Some examples in that file include the chmod and chown GNU library functions.)
An example using the SAT-builder method to create a p-model file myprogram.pom.yml, from directory src/constraint_gen/, after getting SAT output from the SAT-solver verifier:
First, perform the quick "Demo of basic use" here. That produces a solution.json file in directory src/constraint_gen/out.
Next, use the SAT-builder method to create the p-model file.
./props_to_pom_yaml.py out/solution.json --lib ../test/test.famous.llm.pom.yml > myprogram.pom.yml
The model it creates is slightly different but compatible with the manually-generated p-model for test/swap.c.
The following files can be used to verify that the builder's p-model is correct.
/host/d2u/dos2unix.manual.pom.yml
A model created by hand while studying the dos2unix source code
/host/d2u/dos2unix.sorted.pom.yml
This is the manual p-model file, but missing comments, and all the keys are sorted.
You can see any differences between the file produced by the LLM or SAT-solver and the sorted file using diff:
diff -u /host/src/dos2unix.llm.pom.yml /host/d2u/dos2unix.sorted.pom.ymlIf there are no differences from the LLM-produced p-model, then the LLM was 100% accurate in generating the model.
You should run the verifier to confirm that the code complies with POM and (if you are using an already-produced p-model for the program) that the p-model is complete.
The current verifier uses a SAT solver. There is also an older, incomplete verifier that checks if the AST complies with the POM. The following sections describe each.
This verifier examines the LLVM IR generated from the source code. It runs our constraint generator and a SAT solver.
The verifier output is designed to help developers to quickly understand if the code is POM-compliant or not.
- If the finding is
SAT(it is POM-compliant), developers are provided with the validation detail. - If the finding is
UNSAT(it is not POM-compliant), developers are directed to an UNSAT core (a subset of the original clauses that is sufficient to prove unsatisfiability) to help them understand the problem so that they can fix it. Related output files provide traces to both the.ccode (with line numbers and variable names) and.llcode (with line numbers and variable names).
If the constraints are satisfiable, these files are generated:
solution.json: Assignment oftrueorfalseto each of the named variables appearing inconstraints.txt.solution.txt: Raw output from SAT Solver, using numeric variable IDs.
If the constraints are unsatisfiable, these files are generated:
proof.drat: A proof of unsatisfiability.core.unsat: The subset of clauses fromconstraints.dimacsthat are used in the above proof.core.unsat.named: Same ascore.unsat, except using descriptive variable names instead of numeric variable IDs.
The dos2unix LLVM IR .ll files are in your container's /host/d2u directory.
Combine the LLVM IR files (use the filenames you get, which may have different strings before the raw.ll part of the filenames:
cd /host/d2u
llvm-link-15 common.5ba35ca31f1cf1a66823b734.raw.ll dos2unix.955979806ce1344d61d417cc.raw.ll querycp.8654988538a257e3708d2bad.raw.ll -S -o combined.raw.ll
We use two p-models for dos2unix (one with library functions used by dos2unix: dos2unix.famous.manual.pom.yml, the other is dos2unix.llm.pom.yml. We use whole-program mode for this example.
export LLVM_OPTS="-whole-prog"
cd /host/src/constraint_gen
POM_FILE="/host/src/dos2unix.llm.pom.yml /host/d2u/dos2unix.famous.manual.pom.yml" ./run.sh /host/d2u/combined.raw.ll out
The output should say UNSATISFIABLE, which is also the result if you use our manually-built file dos2unix.sorted.pom.yml:
POM_FILE="/host/d2u/dos2unix.sorted.pom.yml /host/d2u/dos2unix.famous.manual.pom.yml" ./run.sh /host/d2u/combined.raw.ll out
There are some warnings that say Warning: indirect function calls (i.e., calls via function pointers) are not supported. That's because dos2unix has function pointers. Currently, we treat calls via function pointers as no-ops.
/host/d2u/expected_results.txt says that the main function is expected to be BAD due to memory leaks.
More detail about our SAT-solver verifier, along with demonstration examples with detailed steps (one SAT, one UNSAT, one using an input p-model file, and a demo of summarizing a function), commands, and explanation of outputs (plus more) is provided here.
Our AST verifier examines an AST file generated from the source code. It also takes the p-model as input.
Execution of the following AST verifier example requires all of the .ast.json.gz and .ll files from the temp directory that were produced in the example step above Running the p-model Builder.
This command examines the AST of the dos2unix file common.c:
cd /host
python3 src/verify.py src/dos2unix.llm.pom.yml d2u/common.*.raw.ast.json.gz > verifier.outputThe output indicates the details the verifier analyzed. The verifier also warns about any differences between the p-model file's functions and the AST's functions. Since the p-model contains all functions in the dos2unix program, the verifier issues warnings about functions not in common.c, such as main and standard C functions like stoi.
The AST verifier is incomplete. It only confirms that the pointers listed in the functions' arguments, local variables, and return values are listed both in the code and the p-model.
Our design document includes a lot of code examples that the POM design considered. Some are common code constructs and others are corner cases that we determined should be within or outside of the scope of POM constraints. You can check how POM (and your selected LLM) handles these code examples, starting by automatically extracting code from the document as follows:
Note: this does not need to be run in the docker.cc.cert.org/pom/pointer-ownership-model if Python 3 is available on the host system
- Clone Markdown Code Extractor into repository root
git clone https://github.com/sei-svishnubhatla/markdown-code-extractor.git- Run MCE on design document and output to
design-extracts
python3 markdown-code-extractor/markdown-code-extractor.py design.md design-extractsIf you get this error inside your container:
!fatal: detected dubious ownership in repository at 'dos2unix'
To add an exception for this directory, call:
Then, run the git config command as recommended. (This error can occur because shared files inside one Docker container may have a different owner than the same files outside the container.)
git config --global --add safe.directory dos2unixThe bear command is useful for building compilation databases, aka the compile_commands.json file, which is used by POM to understand how software is built. For more info on bear, see: https://github.com/rizsotto/Bear.
If you cannot install or run Bear on the build platform, you must create the compilation database via other means. The format of the JSON compilation database contains a specification of the compilation database. It also contains instructions for producing this database using Clang. The format is fairly straightforward, and so you might be able to produce the file manually.
The Verifier may give you errors saying that several common functions, like strcpy have no associated p-models. You need to provide p-models for these functions.
For every function name input, our script build_famous_fn_pom.py interacts with GPT and builds a POM signature for that function, dealing with its arguments and return value. This will only work if the function is "famous" so that GPT can find out what its signature and documentation is.
cd /host/src
python3 build_famous_fn_pom.py --verbose <function1> <function2> ...You may call this with as many arguments as you want, or call it once for each argument.
Once you have valid results for each function, you can compile them all together into a p-model using the command:
python3 build_famous.yml.py --verbose dos2unix.famous.llm.pom.ymlThis project uses externally-developed open-source software. Most of it is specified in and installed by the Dockerfile, except for ghostq, located in src/constraint_gen/ghostq. Such software includes (among others):
| Software | Version | Host | License | | bear | 3.1.3 | Github | GPL 3 | Clang | 15.0.7 | Github | Apache 2 | | yamale | 6.0.0 | Github | MIT | | cryptominisat | 5.11.15 | Github | MIT + GPL for Bliss | | minisat | 2.2.1 | self | OSS (Niklas Sorensson) | | drat-trim | v05.22.2023 | Github | MIT | | ghostq | | self | GPL 3 |
Contributions to the Pointer Ownership Model codebase are welcome! If you have code repairs or other contributions, we welcome that - you could submit a pull request via GitHub, or contact us if you'd prefer a different way.