[SPARK-41823][CONNECT] Scala Client resolve ambiguous columns in Join #40156

hvanhovell · 2023-02-24T04:58:14Z

What changes were proposed in this pull request?

This is the scala version of #39925.

We introduce a plan_id that is both used for each plan created by the scala client, and by the columns created when calling Dataframe.col(..) and Dataframe.apply(..). This way we can later properly resolve the columns created for a specific Dataframe.

Why are the changes needed?

Joining columns created using Dataframe.apply(...) does not work when the column names are ambiguous. We should be able to figure out where a column comes from when they are created like this.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Updated golden files. Added test case to ClientE2ETestSuite.

# Conflicts: # connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala

hvanhovell · 2023-02-24T04:58:24Z

cc @grundprinzip

hvanhovell · 2023-02-24T04:58:47Z

Large change because all the golden files were updated.

# Conflicts: # connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala

hvanhovell · 2023-02-24T17:04:37Z

Merging.

### What changes were proposed in this pull request? This is the scala version of #39925. We introduce a plan_id that is both used for each plan created by the scala client, and by the columns created when calling `Dataframe.col(..)` and `Dataframe.apply(..)`. This way we can later properly resolve the columns created for a specific Dataframe. ### Why are the changes needed? Joining columns created using Dataframe.apply(...) does not work when the column names are ambiguous. We should be able to figure out where a column comes from when they are created like this. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Updated golden files. Added test case to ClientE2ETestSuite. Closes #40156 from hvanhovell/SPARK-41823. Authored-by: Herman van Hovell <[email protected]> Signed-off-by: Herman van Hovell <[email protected]> (cherry picked from commit 6a24330) Signed-off-by: Herman van Hovell <[email protected]>

### What changes were proposed in this pull request? This is the scala version of apache#39925. We introduce a plan_id that is both used for each plan created by the scala client, and by the columns created when calling `Dataframe.col(..)` and `Dataframe.apply(..)`. This way we can later properly resolve the columns created for a specific Dataframe. ### Why are the changes needed? Joining columns created using Dataframe.apply(...) does not work when the column names are ambiguous. We should be able to figure out where a column comes from when they are created like this. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Updated golden files. Added test case to ClientE2ETestSuite. Closes apache#40156 from hvanhovell/SPARK-41823. Authored-by: Herman van Hovell <[email protected]> Signed-off-by: Herman van Hovell <[email protected]> (cherry picked from commit 6a24330) Signed-off-by: Herman van Hovell <[email protected]>

hvanhovell added 5 commits February 20, 2023 07:17

initial pr

07571b9

Merge remote-tracking branch 'apache/master' into plan-id

4a37b6d

Add Plan ID

0dacabd

Merge remote-tracking branch 'apache/master' into plan-id

040af14

# Conflicts: # connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala

Add plan_id

87762e0

github-actions bot added CONNECT SQL labels Feb 24, 2023

hvanhovell added 2 commits February 24, 2023 07:45

Merge remote-tracking branch 'apache/master' into SPARK-41823

3e063fe

Style and updated test

e9591f7

HyukjinKwon approved these changes Feb 24, 2023

View reviewed changes

grundprinzip approved these changes Feb 24, 2023

View reviewed changes

hvanhovell added 2 commits February 24, 2023 08:47

Merge remote-tracking branch 'apache/master' into SPARK-41823

870feae

# Conflicts: # connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala

Fix test

5061a28

hvanhovell closed this in 6a24330 Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-41823][CONNECT] Scala Client resolve ambiguous columns in Join #40156

[SPARK-41823][CONNECT] Scala Client resolve ambiguous columns in Join #40156

Uh oh!

hvanhovell commented Feb 24, 2023

Uh oh!

hvanhovell commented Feb 24, 2023

Uh oh!

hvanhovell commented Feb 24, 2023

Uh oh!

hvanhovell commented Feb 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-41823][CONNECT] Scala Client resolve ambiguous columns in Join #40156

[SPARK-41823][CONNECT] Scala Client resolve ambiguous columns in Join #40156

Uh oh!

Conversation

hvanhovell commented Feb 24, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

hvanhovell commented Feb 24, 2023

Uh oh!

hvanhovell commented Feb 24, 2023

Uh oh!

hvanhovell commented Feb 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants