Skip to content

Conversation

@peter-toth
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds compatibility of handling a WITH clause within another WITH cause. Before this PR these queries retuned 1 while after this PR they return 2 as PostgreSQL does:

WITH
  t AS (SELECT 1),
  t2 AS (
    WITH t AS (SELECT 2)
    SELECT * FROM t
  )
SELECT * FROM t2
WITH t AS (SELECT 1)
SELECT (
  WITH t AS (SELECT 2)
  SELECT * FROM t
)

As this is an incompatible change, the PR introduces the spark.sql.legacy.cte.substitution.enabled flag as an option to restore old behaviour.

How was this patch tested?

Added new UTs.

@peter-toth
Copy link
Contributor Author

This is WIP as it contains changes of #24831

cc @maropu @dongjoon-hyun @gatorsmile @mgaido91

@maropu
Copy link
Member

maropu commented Jul 2, 2019

Also, you need to update the title.

@peter-toth peter-toth changed the title [WIP][SPARK-28228][SQL] Better support for WITH clause [WIP][SPARK-28228][SQL] Fix substitution order of nested WITH clauses Jul 2, 2019
@peter-toth
Copy link
Contributor Author

Also, you need to update the title.

Thanks, I changed it to "Fix substitution order of nested WITH clauses". I will update the migration guide too a bit later.

@SparkQA
Copy link

SparkQA commented Jul 2, 2019

Test build #107104 has finished for PR 25029 at commit 2ef85e5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jul 2, 2019

Test build #107126 has finished for PR 25029 at commit 2ef85e5.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 2, 2019

retest this please

@maropu
Copy link
Member

maropu commented Jul 3, 2019

(If this pr gets ready for reviews, I think it'd be better to drop WIP in the title...)

@SparkQA
Copy link

SparkQA commented Jul 3, 2019

Test build #107135 has finished for PR 25029 at commit 2ef85e5.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@peter-toth
Copy link
Contributor Author

(If this pr gets ready for reviews, I think it'd be better to drop WIP in the title...)

Sure, I will remove WIP if #24831 gets accepted and I can rebase this PR on that.

@SparkQA
Copy link

SparkQA commented Jul 3, 2019

Test build #107171 has finished for PR 25029 at commit 26b351e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Could you rebase this, @peter-toth ?

@peter-toth peter-toth changed the title [WIP][SPARK-28228][SQL] Fix substitution order of nested WITH clauses [SPARK-28228][SQL] Fix substitution order of nested WITH clauses Jul 4, 2019
@peter-toth
Copy link
Contributor Author

@maropu, @dongjoon-hyun, @mgaido91 I removed the WIP tag and this PR is ready for review now.

@SparkQA
Copy link

SparkQA commented Jul 4, 2019

Test build #107239 has finished for PR 25029 at commit ca1c7f0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2019

Test build #107292 has finished for PR 25029 at commit 4a913e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 8, 2019

Test build #107359 has finished for PR 25029 at commit 73824bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

Hi, @gatorsmile .
Could you review this PR?

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107477 has finished for PR 25029 at commit 73824bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 11, 2019

Test build #107512 has finished for PR 25029 at commit 45f0642.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 11, 2019

Test build #107518 has finished for PR 25029 at commit 55a01ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Let's use spark.sql.legacy.ctePrecedence.enabled.

For the other review comments, it seems that all are addressed. @maropu . Could you review this once more? Or, do you want a more test case?

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only some minor comments, LGTM otherwise.

Copy link
Contributor Author

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use spark.sql.legacy.ctePrecedence.enabled.

Thanks. I've changed it.

@SparkQA
Copy link

SparkQA commented Jul 12, 2019

Test build #107590 has finished for PR 25029 at commit 7d9d96f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.
Thank you, @peter-toth , @maropu , @mgaido91 !

@peter-toth
Copy link
Contributor Author

Thanks so much for the review @dongjoon-hyun, @maropu, @mgaido91!

vinodkc pushed a commit to vinodkc/spark that referenced this pull request Jul 18, 2019
## What changes were proposed in this pull request?

This PR adds compatibility of handling a `WITH` clause within another `WITH` cause. Before this PR these queries retuned `1` while after this PR they return `2` as PostgreSQL does:
```
WITH
  t AS (SELECT 1),
  t2 AS (
    WITH t AS (SELECT 2)
    SELECT * FROM t
  )
SELECT * FROM t2
```
```
WITH t AS (SELECT 1)
SELECT (
  WITH t AS (SELECT 2)
  SELECT * FROM t
)
```
As this is an incompatible change, the PR introduces the `spark.sql.legacy.cte.substitution.enabled` flag as an option to restore old behaviour.

## How was this patch tested?

Added new UTs.

Closes apache#25029 from peter-toth/SPARK-28228.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Feb 8, 2020
…ested WITH clause

### What changes were proposed in this pull request?
This is a follow-up for #25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config `spark.sql.legacy.ctePrecedence.enabled` should be set explicitly for the expected behavior.

### Why are the changes needed?
The original change might risky to end-users, it changes behavior silently.

### Does this PR introduce any user-facing change?
Yes, change the config `spark.sql.legacy.ctePrecedence.enabled` as optional.

### How was this patch tested?
New UT.

Closes #27454 from xuanyuanking/SPARK-28228-follow.

Authored-by: Yuanjian Li <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Feb 8, 2020
…ested WITH clause

### What changes were proposed in this pull request?
This is a follow-up for #25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config `spark.sql.legacy.ctePrecedence.enabled` should be set explicitly for the expected behavior.

### Why are the changes needed?
The original change might risky to end-users, it changes behavior silently.

### Does this PR introduce any user-facing change?
Yes, change the config `spark.sql.legacy.ctePrecedence.enabled` as optional.

### How was this patch tested?
New UT.

Closes #27454 from xuanyuanking/SPARK-28228-follow.

Authored-by: Yuanjian Li <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 3db3e39)
Signed-off-by: Dongjoon Hyun <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…ested WITH clause

### What changes were proposed in this pull request?
This is a follow-up for apache#25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config `spark.sql.legacy.ctePrecedence.enabled` should be set explicitly for the expected behavior.

### Why are the changes needed?
The original change might risky to end-users, it changes behavior silently.

### Does this PR introduce any user-facing change?
Yes, change the config `spark.sql.legacy.ctePrecedence.enabled` as optional.

### How was this patch tested?
New UT.

Closes apache#27454 from xuanyuanking/SPARK-28228-follow.

Authored-by: Yuanjian Li <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
// child might contain an inner CTE that has priority so traverse and substitute inner CTEs
// in child first
val traversedChild: LogicalPlan = child transformExpressions {
case e: SubqueryExpression => e.withNewPlan(traverseAndSubstituteCTE(e.plan, true))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subquery expression seems not correctly handled.

with t1 as (select 1 i) select * from t1 where i in (with t1 as (select 2 i) select * from t1) returns 1 in Spark, but empty row in pgsql.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it probably should be transformAllExpressions. Will look into it soon...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Thanks, @cloud-fan .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #28318 to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants