Skip to content

Commit 52bdeed

Browse files
author
Andrew Law
committed
First draft integrity docs
1 parent b91843f commit 52bdeed

File tree

1 file changed

+60
-0
lines changed

1 file changed

+60
-0
lines changed

docs/src/integrity/integrity.rst

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
***********************
2+
Computational Integrity
3+
***********************
4+
5+
The integrity module of Opaque ensures that the untrusted job driver hosted on the cloud service schedules tasks in the manner computed by Spark's Catalyst query optimizer.
6+
Opaque runs on Spark, which utilizes data partitioning to speed up computation.
7+
Specifically, Catalyst will compute a physical query plan for a given dataframe query and delegate Spark workers (run on enclaves) to compute Spark SQL operations on data partitions.
8+
Each of these individual units is trusted, but the intermediary steps in which the units communicate is controlled by the job driver, running as untrusted code in the cloud.
9+
The integrity module will detect if the job driver has deviated from the query plan computed by Catalyst.
10+
11+
Overview
12+
--------
13+
The main idea behind integrity support is to tag each step of computation with a MAC, attached by the enclave worker when it has completed its computation.
14+
All MACs received by all previous enclave workers are logged. In the end, these MACs are compared and reconstructed into a graph.
15+
This graph is compared to that computed by Catalyst.
16+
If the graphs are isomorphic, then no tampering has occurred.
17+
Else, the result of the query returned by the cloud is rejected.
18+
19+
Implementation
20+
--------------
21+
Two main extensions were made to support integrity - one in enclave code, and one in the Scala client application.
22+
23+
Enclave Code
24+
^^^^^^^^^^^^
25+
In the enclave code (C++), modifications were made to the ``FlatbuffersWriters.cpp`` file.
26+
Attached to every output of an ``EncryptedBlocks``` object is a MAC over the output.
27+
No further modifications need to be made to the application logic since this functionality hooks into how Opaque workers output their data.
28+
29+
Scala/Application Code
30+
^^^^^^^^^^^^^^^^^^^^^^
31+
The main extension supporting Integrity is the ```JobVerificationEngine`` which is a piece of Scala code that broadly carries out three tasks:
32+
33+
1. Reconstruct the flow of information between enclave workers.
34+
35+
2. Compute the corresponding DAG of ecalls for a given query.
36+
37+
3. Compare the two DAGs and output "accept" or "reject."
38+
39+
These happen in the "verify" function of the JobVerificationEngine class.
40+
41+
Reconstructing the executed DAG of ecalls involves iterating through the MACs attached by enclave workers, provided in the "LogEntryChain" object in the Job Verification Engine.
42+
This object is filled by Opaque when Spark's ``collect`` method is called when a query is executed.
43+
44+
Output MACs of parents correspond to input MACs of their child. Using this information, the DAG is created.
45+
46+
The "expected" DAG is created from Spark's ``dataframe.queryPlan.executedPlan`` object which is a recursive tree node of Spark Operators.
47+
The Job Verification Engine contains the logic to transform this tree of operators into a tree of ecalls.
48+
49+
Adding Integrity Support for New Operators
50+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
51+
To support new operators, if they are added, one should make changes to the Enclave code and the Job Verification Engine code.
52+
53+
In the enclave, make sure that the enclave context's "finish_ecall" method is called before returning in ``Enclave.cpp```.
54+
55+
In the Job Verification Engine, add the logic to transform the operator into a list of ecalls that the operator uses in ``generateJobNodes``.
56+
This amounts to adding a case in the switch statement of this function.
57+
58+
Furthermore, add the logic to connect the ecalls together in ``linkEcalls``.
59+
As above, this amounts to adding a case in the switch statement of this function, but requires knowledge of how each ecall communicates the transfer of data partitions to its successor ecall
60+
(broadcast, all to one, one to all, etc.).

0 commit comments

Comments
 (0)