[SPARK-15441][SQL] support null object in Dataset outer-join #13425

cloud-fan · 2016-05-31T23:02:06Z

What changes were proposed in this pull request?

Currently we can't encode top level null object into internal row, as Spark SQL doesn't allow row to be null, only its columns can be null.

This is not a problem before, as we assume the input object is never null. However, for outer join, we do need the semantics of null object.

This PR fixes this problem by making both join sides produce a single column, i.e. nest the logical plan output(by CreateStruct), so that we have an extra level to represent top level null obejct.

How was this patch tested?

new test in DatasetSuite

cloud-fan · 2016-05-31T23:02:37Z

cc @marmbrus @yhuai @davies @liancheng @clockfly

SparkQA · 2016-06-01T00:37:16Z

Test build #59692 has finished for PR 13425 at commit 56cf840.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2016-06-01T23:16:28Z

LGTM, merging to master and branch-2.0.

## What changes were proposed in this pull request? Currently we can't encode top level null object into internal row, as Spark SQL doesn't allow row to be null, only its columns can be null. This is not a problem before, as we assume the input object is never null. However, for outer join, we do need the semantics of null object. This PR fixes this problem by making both join sides produce a single column, i.e. nest the logical plan output(by `CreateStruct`), so that we have an extra level to represent top level null obejct. ## How was this patch tested? new test in `DatasetSuite` Author: Wenchen Fan <[email protected]> Closes #13425 from cloud-fan/outer-join2. (cherry picked from commit 8640cdb) Signed-off-by: Cheng Lian <[email protected]>

## What changes were proposed in this pull request? It's similar to the bug fixed in #13425, we should consider null object and wrap the `CreateStruct` with `If` to do null check. This PR also improves the test framework to test the objects of `Dataset[T]` directly, instead of calling `toDF` and compare the rows. ## How was this patch tested? new test in `DatasetAggregatorSuite` Author: Wenchen Fan <[email protected]> Closes #13553 from cloud-fan/agg-null. (cherry picked from commit cd47e23) Signed-off-by: Herman van Hovell <[email protected]>

support null object in Dataset outer-join

56cf840

cloud-fan mentioned this pull request May 31, 2016

[SPARK-15140][SPARK-15441][SQL][WIP] support null object in encoder #13322

Closed

asfgit closed this in 8640cdb Jun 1, 2016

cloud-fan mentioned this pull request Jun 8, 2016

[SPARK-15814][SQL] Aggregator can return null result #13553

Closed

JoshRosen mentioned this pull request May 24, 2019

[SPARK-27829][SQL] In Dataset.joinWith() inner joins, don't nest data before shuffling #24693

Closed

cdegroc mentioned this pull request Jan 11, 2022

[SPARK-37829][SQL] DataFrame.joinWith should return null rows for missing values #35139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-15441][SQL] support null object in Dataset outer-join #13425

[SPARK-15441][SQL] support null object in Dataset outer-join #13425

Uh oh!

cloud-fan commented May 31, 2016 •

edited

Loading

Uh oh!

cloud-fan commented May 31, 2016

Uh oh!

SparkQA commented Jun 1, 2016

Uh oh!

liancheng commented Jun 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-15441][SQL] support null object in Dataset outer-join #13425

[SPARK-15441][SQL] support null object in Dataset outer-join #13425

Uh oh!

Conversation

cloud-fan commented May 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented May 31, 2016

Uh oh!

SparkQA commented Jun 1, 2016

Uh oh!

liancheng commented Jun 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cloud-fan commented May 31, 2016 •

edited

Loading