|
| 1 | +*********************** |
| 2 | +Computational Integrity |
| 3 | +*********************** |
| 4 | + |
| 5 | +The integrity module of Opaque ensures that the untrusted job driver hosted on the cloud service schedules tasks in the manner computed by Spark's Catalyst query optimizer. |
| 6 | +Opaque runs on Spark, which utilizes data partitioning to speed up computation. |
| 7 | +Specifically, Catalyst will compute a physical query plan for a given dataframe query and delegate Spark workers (run on enclaves) to compute Spark SQL operations on data partitions. |
| 8 | +Each of these individual units is trusted, but the intermediary steps in which the units communicate is controlled by the job driver, running as untrusted code in the cloud. |
| 9 | +The integrity module will detect if the job driver has deviated from the query plan computed by Catalyst. |
| 10 | + |
| 11 | +Overview |
| 12 | +-------- |
| 13 | +The main idea behind integrity support is to tag each step of computation with a MAC, attached by the enclave worker when it has completed its computation. |
| 14 | +All MACs received by all previous enclave workers are logged. In the end, these MACs are compared and reconstructed into a graph. |
| 15 | +This graph is compared to that computed by Catalyst. |
| 16 | +If the graphs are isomorphic, then no tampering has occurred. |
| 17 | +Else, the result of the query returned by the cloud is rejected. |
| 18 | + |
| 19 | +Implementation |
| 20 | +-------------- |
| 21 | +Two main extensions were made to support integrity - one in enclave code, and one in the Scala client application. |
| 22 | + |
| 23 | +Enclave Code |
| 24 | +^^^^^^^^^^^^ |
| 25 | +In the enclave code (C++), modifications were made to the ``FlatbuffersWriters.cpp`` file. |
| 26 | +Attached to every output of an ``EncryptedBlocks``` object is a MAC over the output. |
| 27 | +No further modifications need to be made to the application logic since this functionality hooks into how Opaque workers output their data. |
| 28 | + |
| 29 | +Scala/Application Code |
| 30 | +^^^^^^^^^^^^^^^^^^^^^^ |
| 31 | +The main extension supporting Integrity is the ```JobVerificationEngine`` which is a piece of Scala code that broadly carries out three tasks: |
| 32 | + |
| 33 | +1. Reconstruct the flow of information between enclave workers. |
| 34 | + |
| 35 | +2. Compute the corresponding DAG of ecalls for a given query. |
| 36 | + |
| 37 | +3. Compare the two DAGs and output "accept" or "reject." |
| 38 | + |
| 39 | +These happen in the "verify" function of the JobVerificationEngine class. |
| 40 | + |
| 41 | +Reconstructing the executed DAG of ecalls involves iterating through the MACs attached by enclave workers, provided in the "LogEntryChain" object in the Job Verification Engine. |
| 42 | +This object is filled by Opaque when Spark's ``collect`` method is called when a query is executed. |
| 43 | + |
| 44 | +Output MACs of parents correspond to input MACs of their child. Using this information, the DAG is created. |
| 45 | + |
| 46 | +The "expected" DAG is created from Spark's ``dataframe.queryPlan.executedPlan`` object which is a recursive tree node of Spark Operators. |
| 47 | +The Job Verification Engine contains the logic to transform this tree of operators into a tree of ecalls. |
| 48 | + |
| 49 | +Adding Integrity Support for New Operators |
| 50 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 51 | +To support new operators, if they are added, one should make changes to the Enclave code and the Job Verification Engine code. |
| 52 | + |
| 53 | +In the enclave, make sure that the enclave context's "finish_ecall" method is called before returning in ``Enclave.cpp```. |
| 54 | + |
| 55 | +In the Job Verification Engine, add the logic to transform the operator into a list of ecalls that the operator uses in ``generateJobNodes``. |
| 56 | +This amounts to adding a case in the switch statement of this function. |
| 57 | + |
| 58 | +Furthermore, add the logic to connect the ecalls together in ``linkEcalls``. |
| 59 | +As above, this amounts to adding a case in the switch statement of this function, but requires knowledge of how each ecall communicates the transfer of data partitions to its successor ecall |
| 60 | +(broadcast, all to one, one to all, etc.). |
0 commit comments