-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28228][SQL] Fix substitution order of nested WITH clauses #25029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is WIP as it contains changes of #24831 |
|
Also, you need to update the title. |
Thanks, I changed it to "Fix substitution order of nested WITH clauses". I will update the migration guide too a bit later. |
|
Test build #107104 has finished for PR 25029 at commit
|
|
Retest this please. |
|
Test build #107126 has finished for PR 25029 at commit
|
|
retest this please |
|
(If this pr gets ready for reviews, I think it'd be better to drop WIP in the title...) |
|
Test build #107135 has finished for PR 25029 at commit
|
Sure, I will remove WIP if #24831 gets accepted and I can rebase this PR on that. |
|
Test build #107171 has finished for PR 25029 at commit
|
|
Could you rebase this, @peter-toth ? |
|
@maropu, @dongjoon-hyun, @mgaido91 I removed the WIP tag and this PR is ready for review now. |
|
Test build #107239 has finished for PR 25029 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
|
Test build #107292 has finished for PR 25029 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Show resolved
Hide resolved
|
Test build #107359 has finished for PR 25029 at commit
|
|
Retest this please. |
|
Hi, @gatorsmile . |
|
Test build #107477 has finished for PR 25029 at commit
|
|
Test build #107512 has finished for PR 25029 at commit
|
|
Test build #107518 has finished for PR 25029 at commit
|
|
Let's use For the other review comments, it seems that all are addressed. @maropu . Could you review this once more? Or, do you want a more test case? |
mgaido91
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only some minor comments, LGTM otherwise.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
peter-toth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use
spark.sql.legacy.ctePrecedence.enabled.
Thanks. I've changed it.
|
Test build #107590 has finished for PR 25029 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master.
Thank you, @peter-toth , @maropu , @mgaido91 !
|
Thanks so much for the review @dongjoon-hyun, @maropu, @mgaido91! |
## What changes were proposed in this pull request?
This PR adds compatibility of handling a `WITH` clause within another `WITH` cause. Before this PR these queries retuned `1` while after this PR they return `2` as PostgreSQL does:
```
WITH
t AS (SELECT 1),
t2 AS (
WITH t AS (SELECT 2)
SELECT * FROM t
)
SELECT * FROM t2
```
```
WITH t AS (SELECT 1)
SELECT (
WITH t AS (SELECT 2)
SELECT * FROM t
)
```
As this is an incompatible change, the PR introduces the `spark.sql.legacy.cte.substitution.enabled` flag as an option to restore old behaviour.
## How was this patch tested?
Added new UTs.
Closes apache#25029 from peter-toth/SPARK-28228.
Authored-by: Peter Toth <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
…ested WITH clause ### What changes were proposed in this pull request? This is a follow-up for #25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config `spark.sql.legacy.ctePrecedence.enabled` should be set explicitly for the expected behavior. ### Why are the changes needed? The original change might risky to end-users, it changes behavior silently. ### Does this PR introduce any user-facing change? Yes, change the config `spark.sql.legacy.ctePrecedence.enabled` as optional. ### How was this patch tested? New UT. Closes #27454 from xuanyuanking/SPARK-28228-follow. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…ested WITH clause ### What changes were proposed in this pull request? This is a follow-up for #25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config `spark.sql.legacy.ctePrecedence.enabled` should be set explicitly for the expected behavior. ### Why are the changes needed? The original change might risky to end-users, it changes behavior silently. ### Does this PR introduce any user-facing change? Yes, change the config `spark.sql.legacy.ctePrecedence.enabled` as optional. ### How was this patch tested? New UT. Closes #27454 from xuanyuanking/SPARK-28228-follow. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 3db3e39) Signed-off-by: Dongjoon Hyun <[email protected]>
…ested WITH clause ### What changes were proposed in this pull request? This is a follow-up for apache#25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config `spark.sql.legacy.ctePrecedence.enabled` should be set explicitly for the expected behavior. ### Why are the changes needed? The original change might risky to end-users, it changes behavior silently. ### Does this PR introduce any user-facing change? Yes, change the config `spark.sql.legacy.ctePrecedence.enabled` as optional. ### How was this patch tested? New UT. Closes apache#27454 from xuanyuanking/SPARK-28228-follow. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
| // child might contain an inner CTE that has priority so traverse and substitute inner CTEs | ||
| // in child first | ||
| val traversedChild: LogicalPlan = child transformExpressions { | ||
| case e: SubqueryExpression => e.withNewPlan(traverseAndSubstituteCTE(e.plan, true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subquery expression seems not correctly handled.
with t1 as (select 1 i) select * from t1 where i in (with t1 as (select 2 i) select * from t1) returns 1 in Spark, but empty row in pgsql.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it probably should be transformAllExpressions. Will look into it soon...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Thanks, @cloud-fan .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #28318 to fix it.
What changes were proposed in this pull request?
This PR adds compatibility of handling a
WITHclause within anotherWITHcause. Before this PR these queries retuned1while after this PR they return2as PostgreSQL does:As this is an incompatible change, the PR introduces the
spark.sql.legacy.cte.substitution.enabledflag as an option to restore old behaviour.How was this patch tested?
Added new UTs.