Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented May 31, 2016

What changes were proposed in this pull request?

Currently we can't encode top level null object into internal row, as Spark SQL doesn't allow row to be null, only its columns can be null.

This is not a problem before, as we assume the input object is never null. However, for outer join, we do need the semantics of null object.

This PR fixes this problem by making both join sides produce a single column, i.e. nest the logical plan output(by CreateStruct), so that we have an extra level to represent top level null obejct.

How was this patch tested?

new test in DatasetSuite

@cloud-fan
Copy link
Contributor Author

@SparkQA
Copy link

SparkQA commented Jun 1, 2016

Test build #59692 has finished for PR 13425 at commit 56cf840.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

LGTM, merging to master and branch-2.0.

asfgit pushed a commit that referenced this pull request Jun 1, 2016
## What changes were proposed in this pull request?

Currently we can't encode top level null object into internal row, as Spark SQL doesn't allow row to be null, only its columns can be null.

This is not a problem before, as we assume the input object is never null. However, for outer join, we do need the semantics of null object.

This PR fixes this problem by making both join sides produce a single column, i.e. nest the logical plan output(by `CreateStruct`), so that we have an extra level to represent top level null obejct.

## How was this patch tested?

new test in `DatasetSuite`

Author: Wenchen Fan <[email protected]>

Closes #13425 from cloud-fan/outer-join2.

(cherry picked from commit 8640cdb)
Signed-off-by: Cheng Lian <[email protected]>
@asfgit asfgit closed this in 8640cdb Jun 1, 2016
asfgit pushed a commit that referenced this pull request Jun 13, 2016
## What changes were proposed in this pull request?

It's similar to the bug fixed in #13425, we should consider null object and wrap the `CreateStruct` with `If` to do null check.

This PR also improves the test framework to test the objects of `Dataset[T]` directly, instead of calling `toDF` and compare the rows.

## How was this patch tested?

new test in `DatasetAggregatorSuite`

Author: Wenchen Fan <[email protected]>

Closes #13553 from cloud-fan/agg-null.

(cherry picked from commit cd47e23)
Signed-off-by: Herman van Hovell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants