Skip to content

Conversation

@hvanhovell
Copy link
Contributor

What changes were proposed in this pull request?

This is the scala version of #39925.

We introduce a plan_id that is both used for each plan created by the scala client, and by the columns created when calling Dataframe.col(..) and Dataframe.apply(..). This way we can later properly resolve the columns created for a specific Dataframe.

Why are the changes needed?

Joining columns created using Dataframe.apply(...) does not work when the column names are ambiguous. We should be able to figure out where a column comes from when they are created like this.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Updated golden files. Added test case to ClientE2ETestSuite.

@hvanhovell
Copy link
Contributor Author

cc @grundprinzip

@hvanhovell
Copy link
Contributor Author

Large change because all the golden files were updated.

# Conflicts:
#	connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala
@hvanhovell
Copy link
Contributor Author

Merging.

hvanhovell added a commit that referenced this pull request Feb 24, 2023
### What changes were proposed in this pull request?
This is the scala version of #39925.

We introduce a plan_id that is both used for each plan created by the scala client, and by the columns created when calling `Dataframe.col(..)` and `Dataframe.apply(..)`. This way we can later properly resolve the columns created for a specific Dataframe.

### Why are the changes needed?
Joining columns  created using Dataframe.apply(...) does not work when the column names are ambiguous. We should be able to figure out where a column comes from when they are created like this.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Updated golden files. Added test case to ClientE2ETestSuite.

Closes #40156 from hvanhovell/SPARK-41823.

Authored-by: Herman van Hovell <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
(cherry picked from commit 6a24330)
Signed-off-by: Herman van Hovell <[email protected]>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
### What changes were proposed in this pull request?
This is the scala version of apache#39925.

We introduce a plan_id that is both used for each plan created by the scala client, and by the columns created when calling `Dataframe.col(..)` and `Dataframe.apply(..)`. This way we can later properly resolve the columns created for a specific Dataframe.

### Why are the changes needed?
Joining columns  created using Dataframe.apply(...) does not work when the column names are ambiguous. We should be able to figure out where a column comes from when they are created like this.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Updated golden files. Added test case to ClientE2ETestSuite.

Closes apache#40156 from hvanhovell/SPARK-41823.

Authored-by: Herman van Hovell <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
(cherry picked from commit 6a24330)
Signed-off-by: Herman van Hovell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants