-
Notifications
You must be signed in to change notification settings - Fork 72
Left/Right Outer support for equi and non-equi joins #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
set up class thing cleanup added test cases for non-equi left anti join rename to serializeEquiJoinExpression added isEncrypted condition set up keys JoinExpr now has condition rename serialization does not throw compile error for BNLJ split up added condition in ExpressionEvaluation.h zipPartitions cpp put in place typo added func to header two loops in place update tests condition fixed scala loop interchange rows added tags ensure cached == match working comparison decoupling in ExpressionEvalulation save compiles and condition works is printing fix swap outer/inner o_i_match show() has the same result tests pass test cleanup added test cases for different condition BuildLeft works optional keys in scala started C++ passes the operator tests comments, cleanup attemping to do it the ~right~ way comments to distinguish between primary/secondary, operator tests pass cleanup comments, about to begin implementation for distinct agg ops is_distinct added test case serializing with isDistinct is_distinct in ExpressionEvaluation.h removed unused code from join implementation remove RowWriter/Reader in condition evaluation (join) easier test serialization done correct checking in Scala set is set up spaghetti but it finally works function for clearing values condition_eval isntead of condition goto comment started impl of multiple partitions fix added rangepartitionexec that runs partitioning cleanup serialization properly comments, generalization for > 1 distinct function comments about to refactor into logical.Aggregation the new case has distinct in result expressions need to match on distinct removed new case (doesn't make difference?) works remove traces of distinct more cleanup address comments rename equi join split Join.cpp into two files Update App.cpp fixed swap issues one more swap stream/broadcast concatEncryptedBlocks, remove import iostream comment for for loop added comments explaining constraints with broadcast side comments left semi done, existence serializes remove existence serialization fixed
* finishing the in expression. adding more tests and null support. need confirmation on null behavior and also I wonder why integer field is sufficient for string * adding additional test * adding additional test * saving concat implementation and it's passing basic functionality tests * adding type aware comparison and better error message for IN operator * adding null checking for the concat operator and adding one additional test * cleaning up IN&Concat PR * deleting concat and preping the in branch for in pr * fixing null bahavior now it's only null when there's no match and there's null input * Build failed Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Wenting Zheng <[email protected]> Co-authored-by: Wenting Zheng <[email protected]> Separate Concat PR (mc2-project#125) Implementation of the CONCAT expression. Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Wenting Zheng <[email protected]> Removed calls to toSet in TPC-H tests (mc2-project#140) * removed calls to toSet * added calls to toSet back where queries are unordered Documentation update (mc2-project#148) Cluster Remote Attestation Fix (mc2-project#146) The existing code only had RA working when run locally. This PR adds a sleep for 5 seconds to make sure that all executors are spun up successfully before attestation begins. Closes mc2-project#147 upgrade to 3.0.1 (mc2-project#144) Update two TPC-H queries (mc2-project#149) Tests for TPC-H 12 and 19 pass. TPC-H 20 Fix (mc2-project#142) * string to stringtype error * tpch 20 passes * cleanup * implemented changes * decimal.tofloat Co-authored-by: Wenting Zheng <[email protected]> Join update (mc2-project#145) Migrate from Travis CI to Github Actions (mc2-project#156) matching in strategies.scala set up class thing cleanup added test cases for non-equi left anti join rename to serializeEquiJoinExpression added isEncrypted condition set up keys JoinExpr now has condition rename serialization does not throw compile error for BNLJ split up added condition in ExpressionEvaluation.h zipPartitions cpp put in place typo added func to header two loops in place update tests condition fixed scala loop interchange rows added tags ensure cached == match working comparison decoupling in ExpressionEvalulation save compiles and condition works is printing fix swap outer/inner o_i_match show() has the same result tests pass test cleanup added test cases for different condition BuildLeft works optional keys in scala started C++ passes the operator tests comments, cleanup attemping to do it the ~right~ way comments to distinguish between primary/secondary, operator tests pass cleanup comments, about to begin implementation for distinct agg ops is_distinct added test case serializing with isDistinct is_distinct in ExpressionEvaluation.h removed unused code from join implementation remove RowWriter/Reader in condition evaluation (join) easier test serialization done correct checking in Scala set is set up spaghetti but it finally works function for clearing values condition_eval isntead of condition goto comment remove explain from test, need to fix distinct aggregation for >1 partitions started impl of multiple partitions fix added rangepartitionexec that runs partitioning cleanup serialization properly comments, generalization for > 1 distinct function comments about to refactor into logical.Aggregation the new case has distinct in result expressions need to match on distinct removed new case (doesn't make difference?) works remove traces of distinct more cleanup Upgrade to OE 0.12 (mc2-project#153) Update README.md Support for scalar subquery (mc2-project#157) This PR implements the scalar subquery expression, which is triggered whenever a subquery returns a scalar value. There were two main problems that needed to be solved. First, support for matching the scalar subquery expression is necessary. Spark implements this by wrapping a SparkPlan within the expression and calls executeCollect. Then it constructs a literal with that value. However, this is problematic for us because that value should not be decrypted by the driver and serialized into an expression, since it's an intermediate value. Therefore, the second issue to be addressed here is supporting an encrypted literal. This is implemented in this PR by serializing an encrypted ciphertext into a base64 encoded string, and wrapping a Decrypt expression on top of it. This expression is then evaluated in the enclave and returns a literal. Note that, in order to test our implementation, we also implement a Decrypt expression in Scala. However, this should never be evaluated on the driver side and serialized into a plaintext literal. This is because Decrypt is designated as a Nondeterministic expression, and therefore will always evaluate on the workers. Add TPC-H Benchmarks (mc2-project#139) * logic decoupling in TPCH.scala for easier benchmarking * added TPCHBenchmark.scala * Benchmark.scala rewrite * done adding all support TPC-H query benchmarks * changed commandline arguments that benchmark takes * TPCHBenchmark takes in parameters * fixed issue with spark conf * size error handling, --help flag * add Utils.force, break cluster mode * comment out logistic regression benchmark * ensureCached right before temp view created/replaced * upgrade to 3.0.1 * upgrade to 3.0.1 * 10 scale factor * persistData * almost done refactor * more cleanup * compiles * 9 passes * cleanup * collect instead of force, sf_none * remove sf_none * defaultParallelism * no removing trailing/leading whitespace * add sf_med * hdfs works in local case * cleanup, added new CLI argument * added newly supported tpch queries * function for running all supported tests address comments added one test case non-null case working rename equi join split Join.cpp into two files outer and default joins split up not handling nulls at all first test case works force_null to all appends test, matching in scala non-nulls working it works for anti and outer cleanup test cases added one row is not being added in the sort merge implementation tpc-h 13 passes comments outer/inner swap, breaks a bunch of things Update App.cpp fixed swap issues for loop instead of flatten concatEncryptedBlocks tpch 13 test passes one more swap stream/broadcast concatEncryptedBlocks, remove import iostream comment for for loop added comments explaining constraints with broadcast side comments
c60ad0a
to
7b95d90
Compare
db57b73
to
83f239e
Compare
90702ee
to
c3ed7a5
Compare
not every partition is getting a foreign row
write_output_rows(primary_unmatched_rows, w, join_type); | ||
write_output_rows(previous_primary_unmatched_rows, w, join_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the input data is very big, then won't previous_primary_unmatched_rows
potentially have to buffer a lot of data, and it will have to write out to w
again? Can you write out to w
directly earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with doing it earlier is that last_foreign_row.get()
can be null. This happens when a new primary group is encountered without any foreign row being encoounted first at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For outer joins, can you put the dummy row first instead of last?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that worked, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
TPC-H 13 passes.
This PR implements left and right outer for both physical operators of join that are currently supported:
NonObliviousSortMergeJoin
andBroadcastNestedLoopJoin
. It adds tests for both, but TPC-H 13 only requires the equi join left outer implementation.