You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* finishing the in expression. adding more tests and null support. need confirmation on null behavior and also I wonder why integer field is sufficient for string
* adding additional test
* adding additional test
* saving concat implementation and it's passing basic functionality tests
* adding type aware comparison and better error message for IN operator
* adding null checking for the concat operator and adding one additional test
* cleaning up IN&Concat PR
* deleting concat and preping the in branch for in pr
* fixing null bahavior
now it's only null when there's no match and there's null input
* Build failed
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Wenting Zheng <[email protected]>
Co-authored-by: Wenting Zheng <[email protected]>
Separate Concat PR (#125)
Implementation of the CONCAT expression.
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Wenting Zheng <[email protected]>
Removed calls to toSet in TPC-H tests (#140)
* removed calls to toSet
* added calls to toSet back where queries are unordered
Documentation update (#148)
Cluster Remote Attestation Fix (#146)
The existing code only had RA working when run locally. This PR adds a sleep for 5 seconds to make sure that all executors are spun up successfully before attestation begins.
Closes#147
upgrade to 3.0.1 (#144)
Update two TPC-H queries (#149)
Tests for TPC-H 12 and 19 pass.
TPC-H 20 Fix (#142)
* string to stringtype error
* tpch 20 passes
* cleanup
* implemented changes
* decimal.tofloat
Co-authored-by: Wenting Zheng <[email protected]>
Join update (#145)
Migrate from Travis CI to Github Actions (#156)
matching in strategies.scala
set up class thing
cleanup
added test cases for non-equi left anti join
rename to serializeEquiJoinExpression
added isEncrypted condition
set up keys
JoinExpr now has condition
rename
serialization does not throw compile error for BNLJ
split up
added condition in ExpressionEvaluation.h
zipPartitions
cpp put in place
typo
added func to header
two loops in place
update tests
condition
fixed scala loop
interchange rows
added tags
ensure cached
== match working
comparison decoupling in ExpressionEvalulation
save
compiles and condition works
is printing
fix swap outer/inner
o_i_match
show() has the same result
tests pass
test cleanup
added test cases for different condition
BuildLeft works
optional keys in scala
started C++
passes the operator tests
comments, cleanup
attemping to do it the ~right~ way
comments to distinguish between primary/secondary, operator tests pass
cleanup comments, about to begin implementation for distinct agg ops
is_distinct
added test case
serializing with isDistinct
is_distinct in ExpressionEvaluation.h
removed unused code from join implementation
remove RowWriter/Reader in condition evaluation (join)
easier test
serialization done
correct checking in Scala
set is set up
spaghetti but it finally works
function for clearing values
condition_eval isntead of condition
goto
comment
remove explain from test, need to fix distinct aggregation for >1 partitions
started impl of multiple partitions fix
added rangepartitionexec that runs
partitioning cleanup
serialization properly
comments, generalization for > 1 distinct function
comments
about to refactor into logical.Aggregation
the new case has distinct in result expressions
need to match on distinct
removed new case (doesn't make difference?)
works
remove traces of distinct
more cleanup
Upgrade to OE 0.12 (#153)
Update README.md
Support for scalar subquery (#157)
This PR implements the scalar subquery expression, which is triggered whenever a subquery returns a scalar value. There were two main problems that needed to be solved.
First, support for matching the scalar subquery expression is necessary. Spark implements this by wrapping a SparkPlan within the expression and calls executeCollect. Then it constructs a literal with that value. However, this is problematic for us because that value should not be decrypted by the driver and serialized into an expression, since it's an intermediate value.
Therefore, the second issue to be addressed here is supporting an encrypted literal. This is implemented in this PR by serializing an encrypted ciphertext into a base64 encoded string, and wrapping a Decrypt expression on top of it. This expression is then evaluated in the enclave and returns a literal. Note that, in order to test our implementation, we also implement a Decrypt expression in Scala. However, this should never be evaluated on the driver side and serialized into a plaintext literal. This is because Decrypt is designated as a Nondeterministic expression, and therefore will always evaluate on the workers.
Add TPC-H Benchmarks (#139)
* logic decoupling in TPCH.scala for easier benchmarking
* added TPCHBenchmark.scala
* Benchmark.scala rewrite
* done adding all support TPC-H query benchmarks
* changed commandline arguments that benchmark takes
* TPCHBenchmark takes in parameters
* fixed issue with spark conf
* size error handling, --help flag
* add Utils.force, break cluster mode
* comment out logistic regression benchmark
* ensureCached right before temp view created/replaced
* upgrade to 3.0.1
* upgrade to 3.0.1
* 10 scale factor
* persistData
* almost done refactor
* more cleanup
* compiles
* 9 passes
* cleanup
* collect instead of force, sf_none
* remove sf_none
* defaultParallelism
* no removing trailing/leading whitespace
* add sf_med
* hdfs works in local case
* cleanup, added new CLI argument
* added newly supported tpch queries
* function for running all supported tests
address comments
added one test case
non-null case working
rename equi join
split Join.cpp into two files
outer and default joins split up
not handling nulls at all
first test case works
force_null to all appends
test, matching in scala
non-nulls working
it works for anti and outer
cleanup
test cases added
one row is not being added in the sort merge implementation
tpc-h 13 passes
comments
outer/inner swap, breaks a bunch of things
Update App.cpp
fixed swap issues
for loop instead of flatten
concatEncryptedBlocks
tpch 13 test passes
one more swap
stream/broadcast
concatEncryptedBlocks, remove import iostream
comment for for loop
added comments explaining constraints with broadcast side
comments
Copy file name to clipboardExpand all lines: README.md
+42-8Lines changed: 42 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,11 +8,12 @@ Opaque is a package for Apache Spark SQL that enables encryption for DataFrames
8
8
9
9
This project is based on the following NSDI 2017 paper [1]. The oblivious execution mode is not included in this release.
10
10
11
-
This is an alpha preview of Opaque, which means the software is still in development (not production-ready!). It currently has the following limitations:
11
+
This is an alpha preview of Opaque, but the software is still in active development. It currently has the following limitations:
12
12
13
13
- Unlike the Spark cluster, the master must be run within a trusted environment (e.g., on the client).
14
14
15
-
- Not all Spark SQL operations are supported. UDFs must be [implemented in C++](#user-defined-functions-udfs).
15
+
- Not all Spark SQL operations are supported (see the [list of supported operations](#supported-functionalities)).
16
+
UDFs must be [implemented in C++](#user-defined-functions-udfs).
16
17
17
18
- Computation integrity verification (section 4.2 of the NSDI paper) is currently work in progress.
18
19
@@ -23,7 +24,7 @@ This is an alpha preview of Opaque, which means the software is still in develop
23
24
24
25
After downloading the Opaque codebase, build and test it as follows.
25
26
26
-
1. Install dependencies and the [OpenEnclave SDK](https://github.com/openenclave/openenclave/blob/v0.9.x/docs/GettingStartedDocs/install_oe_sdk-Ubuntu_18.04.md). We currently support OE version 0.9.0 (so please install with `open-enclave=0.9.0`) and Ubuntu 18.04.
27
+
1. Install dependencies and the [OpenEnclave SDK](https://github.com/openenclave/openenclave/blob/v0.12.0/docs/GettingStartedDocs/install_oe_sdk-Ubuntu_18.04.md). We currently support OE version 0.12.0 (so please install with `open-enclave=0.12.0`) and Ubuntu 18.04.
27
28
28
29
```sh
29
30
# For Ubuntu 18.04:
@@ -59,7 +60,9 @@ After downloading the Opaque codebase, build and test it as follows.
59
60
60
61
## Usage
61
62
62
-
Next, run Apache Spark SQL queries with Opaque as follows, assuming [Spark 3.0](https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz) (`wget http://apache.mirrors.pair.com/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz`) is already installed:
63
+
Next, run Apache Spark SQL queries with Opaque as follows, assuming [Spark 3.0.1](https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz) (`wget http://apache.mirrors.pair.com/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz`) is already installed:
64
+
65
+
\* Opaque needs Spark's `'spark.executor.instances'` property to be set. This can be done in a custom config file, the default config file found at `/opt/spark/conf/spark-defaults.conf`, or as a `spark-submit` or `spark-shell` argument: `--conf 'spark.executor.instances=<value>`.
63
66
64
67
1. Package Opaque into a JAR:
65
68
@@ -136,6 +139,41 @@ Next, run Apache Spark SQL queries with Opaque as follows, assuming [Spark 3.0](
136
139
// | baz| 5|
137
140
// +----+-----+
138
141
```
142
+
143
+
## Supported functionalities
144
+
145
+
This section lists Opaque's supported functionalities, which is a subset of that of Spark SQL. Note that the syntax for these functionalities is the same as Spark SQL -- Opaque simply replaces the execution to work with encrypted data.
146
+
147
+
### Data types
148
+
Out of the existing [Spark SQL types](https://spark.apache.org/docs/latest/sql-ref-datatypes.html), Opaque supports
149
+
150
+
- All numeric types except `DecimalType`, which is currently converted into `FloatType`
151
+
- `StringType`
152
+
- `BinaryType`
153
+
- `BooleanType`
154
+
- `TimestampTime`, `DateType`
155
+
- `ArrayType`, `MapType`
156
+
157
+
### Functions
158
+
We currently support a subset of the Spark SQL functions, including both scalar and aggregate-like functions.
UDFs are not supported directly, but one can [extend Opaque with additional functions](#user-defined-functions-udfs) by writing it in C++.
164
+
165
+
166
+
### Operators
167
+
168
+
Opaque supports the core SQL operators:
169
+
170
+
- Projection
171
+
- Filter
172
+
- Global aggregation and grouping aggregation
173
+
- Order by, sort by
174
+
- Inner join
175
+
- Limit
176
+
139
177
140
178
## User-Defined Functions (UDFs)
141
179
@@ -168,7 +206,3 @@ Now we can port this UDF to Opaque as follows:
168
206
```
169
207
170
208
3. Finally, implement the UDF in C++. In [`FlatbuffersExpressionEvaluator#eval_helper`](src/enclave/Enclave/ExpressionEvaluation.h), add a casefor`tuix::ExprUnion_DotProduct`. Within that case, cast the expression to a `tuix::DotProduct`, recursively evaluate the left and right children, perform the dot product computation on them, and construct a `DoubleField` containing the result.
171
-
172
-
## Contact
173
-
174
-
If you want to know more about our project or have questions, please contact Wenting ([email protected]) and/or Ankur ([email protected]).
0 commit comments