Skip to content

Commit 2db9389

Browse files
author
Andrew Law
committed
Add more detail
1 parent be57507 commit 2db9389

File tree

1 file changed

+58
-11
lines changed

1 file changed

+58
-11
lines changed

docs/src/integrity/integrity.rst

Lines changed: 58 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -11,37 +11,84 @@ shuffling data in an unexpected manner across data partitions, spoofing extra da
1111

1212
Overview
1313
--------
14-
The main idea behind integrity support is to tag each step of computation with a MAC over individual enclave workers' encrypted output, attached by the enclave worker when it has completed its computation.
15-
All MACs received by all previous enclave workers are logged. In the end during post verification, these MACs, which each represent an ecall at a data partition, are compared and reconstructed into a graph.
14+
The main idea behind integrity support is to tag each step of computation with a log over individual enclave workers' encrypted output, attached by the enclave worker when it has completed its computation.
15+
During execution, each enclave worker checks its input, which contains logs of the previous ecall's output, to make sure that no rows were tampered with, dropped, or spoofed by the job driver.
16+
This is done using cryptographic MAC functions, whose output can only be computed by the enclave workers sharing a private key with the client.
17+
The job driver or server is unable to tamper with the data without being detected, since they are unable to forge a well formed MAC without the private key.
18+
In the end during post verification, these log objects, called "Crumbs" which each represent an ecall at a data partition, are compared and used to reconstruct a graph representing the flow of information during query execution.
19+
Specifically, the ``input_macs`` field is matched to other ``all_outputs_mac`` fields of other ecalls to create edges between ecalls and their predecessors.
1620
This graph is compared to the DAG of the query plan computed by Catalyst.
1721
If the graphs are isomorphic, then no tampering has occurred.
1822
Else, the result of the query returned by the cloud is rejected.
1923

24+
Logging
25+
-------
26+
Below are the flatbuffers schemas of the relevant logging objects used for integrity, which can be found under ``src/flatbuffers/EncryptedBlock.fbs``.
27+
28+
::
29+
30+
table EncryptedBlocks {
31+
blocks:[EncryptedBlock];
32+
log:LogEntryChain;
33+
log_mac:[Mac];
34+
all_outputs_mac:[ubyte];
35+
}
36+
37+
table LogEntry {
38+
ecall:int; // ecall executed
39+
num_macs:int; // Number of EncryptedBlock's in this EncryptedBlocks - checked during runtime
40+
mac_lst:[ubyte]; // List of all MACs. one from each EncryptedBlocks - checked during runtime
41+
mac_lst_mac:[ubyte]; // MAC(mac_lst) - checked during runtime
42+
input_macs:[ubyte]; // List of input EncryptedBlocks' all_output_mac's
43+
num_input_macs:int; // Number of input_macs
44+
}
45+
46+
table LogEntryChain {
47+
curr_entries:[LogEntry];
48+
past_entries:[Crumb];
49+
}
50+
51+
// Contains information about an ecall, which will be pieced together during post verfication to verify the DAG
52+
// A crumb is created at an ecall for each previous ecall that sent some data to this ecall
53+
table Crumb {
54+
input_macs:[ubyte]; // List of EncryptedBlocks all_output_mac's, from LogEntry
55+
num_input_macs:int; // Number of input_macs
56+
all_outputs_mac:[ubyte]; // MAC over all outputs of ecall from which this EncryptedBlocks came from, of size OE_HMAC_SIZE
57+
ecall:int; // Ecall executed
58+
log_mac:[ubyte]; // MAC over the LogEntryChain from this EncryptedBlocks, of size OE_HMAC_SIZE
59+
}
60+
61+
The ``EncryptedBlocks`` object is what is produced from an enclave worker and passed to the next ecall.
62+
The ``LogEntry`` object contains information about the current ecall, including its unique integer identifier, MAC outputs over each ``EncryptedBlocks`` it produced, and the ``input_macs`` field, which is a list of the output macs of its predecessor ecall.
63+
The ``LogEntryChain`` contains a list of log entries for a single data partitions. There will be as many ``LogEntryChain`` objects for a given query as there are data partitions.
64+
The ``JobVerificationEngine`` has access to a list of ``LogEntryChain``\s.
65+
The ``Crumb`` object contains the ``LogEntry`` information of previous ecalls, stored in the ``LogEntryChain``.
66+
2067
Implementation
2168
--------------
2269
Two main extensions were made to support integrity - one in enclave code, and one in the Scala client application.
2370

2471
Enclave Code
2572
^^^^^^^^^^^^
2673
In the enclave code (C++), modifications were made to the ``FlatbuffersWriters.cpp`` file and ``FlatbuffersReaders.cpp`` file.
27-
The "write" change attaches a MAC over the ``EncryptedBlocks`` object to the output.
28-
The "read" change checks whether all blocks that were output from the previous ecall were received by the subsequent ecall.
74+
The "write" change attaches a log to the ``EncryptedBlocks`` object, which contains the enclave worker's encrypted output.
75+
The "read" change is a runtime check verifying whether all blocks that were output from the previous ecall were received by the subsequent ecall.
2976
No further modifications need to be made to the application logic since this functionality hooks into how Opaque workers output their data.
3077

3178
Scala/Application Code
3279
^^^^^^^^^^^^^^^^^^^^^^
33-
The main extension supporting Integrity is the ```JobVerificationEngine`` which is a piece of Scala code that broadly carries out three tasks:
80+
The main extension supporting Integrity is the ``JobVerificationEngine`` which is a piece of Scala code that broadly carries out three tasks:
3481

3582
1. Reconstruct the flow of information between enclave workers.
3683

3784
2. Compute the corresponding DAG of ecalls for a given query.
3885

3986
3. Compare the two DAGs and output "accept" or "reject."
4087

41-
These happen in the "verify" function of the JobVerificationEngine class.
88+
These happen in the ``verify`` function of the JobVerificationEngine class.
4289

43-
Reconstructing the executed DAG of ecalls involves iterating through the MACs attached by enclave workers, provided in the "LogEntryChain" object in the Job Verification Engine.
44-
This object is filled by Opaque when Spark's ``collect`` method is called when a query is executed.
90+
Reconstructing the executed DAG of ecalls involves iterating through the MACs attached by enclave workers, which are fields in the ``Crumb`` and ``LogEntry`` objects stored in each ``LogEntryChain`` in the Job Verification Engine.
91+
The list of ``LogEntryChain``\s is filled by Opaque when Spark's ``collect`` method is called when a query is executed.
4592

4693
Output MACs of parents correspond to input MACs of their child. Using this information, the DAG is created.
4794

@@ -52,13 +99,13 @@ Adding Integrity Support for New Operators
5299
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
53100
To support new operators, if they are added, one should make changes to the Enclave code and the Job Verification Engine code.
54101

55-
In the enclave, make sure that the enclave context's "finish_ecall" method is called before returning in ``Enclave.cpp```.
102+
In the enclave, make sure that the enclave context's ``finish_ecall`` method is called before returning in ``Enclave.cpp``.
56103

57104
In the Job Verification Engine, add the logic to transform the operator into a list of ecalls that the operator uses in ``generateJobNodes``.
58-
This amounts to adding a case in the switch statement of this function.
105+
This amounts to adding a case in the cascading if/else statement of this function.
59106

60107
Furthermore, add the logic to connect the ecalls together in ``linkEcalls``.
61-
As above, this amounts to adding a case in the switch statement of this function, but requires knowledge of how each ecall communicates the transfer of data partitions to its successor ecall
108+
As above, this amounts to adding a case in the cascading if/else statement of this function, but requires knowledge of how each ecall communicates the transfer of data partitions to its successor ecall
62109
(broadcast, all to one, one to all, etc.).
63110

64111
Usage

0 commit comments

Comments
 (0)