[SPARK-19799][SQL] Support WITH clause in subqueries #24831

peter-toth · 2019-06-10T12:49:18Z

What changes were proposed in this pull request?

This PR adds support of WITH clause within a subquery so this query becomes valid:

SELECT max(c) FROM (
  WITH t AS (SELECT 1 AS c)
  SELECT * FROM t
)

How was this patch tested?

Added new UTs.

dongjoon-hyun · 2019-06-10T19:02:31Z

ok to test

SparkQA · 2019-06-10T22:06:57Z

Test build #106360 has finished for PR 24831 at commit 741c727.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-06-10T22:40:53Z

cc @gatorsmile since this is a part of PostgreSQL feature parity.

SparkQA · 2019-06-13T12:26:08Z

Test build #106461 has finished for PR 24831 at commit 0b516fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-15T14:09:50Z

Test build #106541 has finished for PR 24831 at commit ca27852.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-22T19:12:07Z

Test build #106793 has finished for PR 24831 at commit d76a265.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

sql/core/src/test/resources/sql-tests/inputs/cte.sql

mgaido91 · 2019-07-01T13:56:35Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

why do we need this?

Thanks @mgaido91! It is not needed indeed.

I remember now why I used lazy here. A CTE definition can be used multiple times in WITH but the call by name parameter (ctePlan = traverseAndSubstituteCTE(...)) should be executed only once.
But now I believe it is better to use lazy outside of substituteCTE than inside, please review my commit 7d69105.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

SparkQA · 2019-07-01T15:28:13Z

Test build #107069 has finished for PR 24831 at commit 9096c4d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-07-01T16:21:01Z

Test build #107073 has finished for PR 24831 at commit fd9ca42.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-07-01T17:47:02Z

Test build #107079 has finished for PR 24831 at commit 39b9596.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-07-01T22:15:50Z

Test build #107084 has finished for PR 24831 at commit 7d69105.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

maropu · 2019-07-01T23:31:54Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

Why we need this flag? Some tests would fail if we have not?

No it would not, but I wanted to do CTE substitution in the current plan only (not in the subqueries) if it is safe. (CTE substitution will run for subqueries later anyway.)

maropu · 2019-07-02T01:20:07Z

@peter-toth Can we split this pr into two parts cleanly: 1. the sub-query support and 2. the behaviour change? Also, I think we need to update the migration guide for the part 2. cc: @gatorsmile

peter-toth · 2019-07-02T06:22:53Z

@peter-toth Can we split this pr into two parts cleanly: 1. the sub-query support and 2. the behaviour change? Also, I think we need to update the migration guide for the part 2. cc: @gatorsmile

@maropu, all right, I dropped the changes that relate to order of substitution and will do it in another ticket. What remained here is just the WITH support in subqueries.

maropu · 2019-07-02T06:35:55Z

Yea, thanks!

peter-toth · 2019-07-02T06:55:37Z

Yea, thanks!

Very welcome.
I moved substitution order related changes to #25029, but it is still WIP and this one should be reviewed first.

SparkQA · 2019-07-02T07:05:02Z

Test build #107101 has finished for PR 24831 at commit cff4bec.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-07-02T10:17:32Z

Test build #107106 has finished for PR 24831 at commit 85e39d4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2019-07-02T19:25:47Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

+/**
+ * Analyze WITH nodes and substitute child plan with CTE definitions.
+ */
+object CTESubstitution extends Rule[LogicalPlan] {


can we avoid moving the class, in order to keep the diff smaller?

The idea of moving the rule to a separate file came from here: #24831 (comment), but I think you are right @mgaido91, because we cut the scope and split the PR since that. Maybe the other part (#25029) could extract the rule to a separate file as that one makes the rule a bit more complicated. Does that work for you @maropu?

Yes, that was what I meant, we can move the rule in the other PR which refactors it more thoroughly.

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

mgaido91 · 2019-07-02T19:29:27Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

-      plan resolveOperatorsUp {
-        case UnresolvedRelation(Seq(table)) if resolver(cteName, table) =>
-          ctePlan
-        case u: UnresolvedRelation =>


why did you remove this?

I don't think this line does anything nor UnresolvedRelation can have an expression so I thought it is safe and good idea to remove the line. Please correct me if I'm wrong.

yes, I think you're right, I was just curious about the reason of this change

mgaido91

only a style comment, otherwise LGTM, thanks!

mgaido91 · 2019-07-03T08:07:02Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

-      plan resolveOperatorsUp {
-        case UnresolvedRelation(Seq(table)) if resolver(cteName, table) =>
-          ctePlan
-        case u: UnresolvedRelation =>


yes, I think you're right, I was just curious about the reason of this change

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

mgaido91 · 2019-07-03T08:14:47Z

sql/core/src/test/resources/sql-tests/results/cte.sql.out

+-- !query 20 schema
+struct<>
+-- !query 20 output
+org.apache.spark.sql.AnalysisException


Just a question, is this going to be addressed in the PR which allows recursive subqueries or is this an invalid query?

I have a WIP PR open #23531 that would add support for recursive queries (and subqueries and subquery expressions too). But these queries lack the RECURSIVE keyword and using an outer recursive reference in a subquery is not allowed (next query) according to the SQL standard so these will never become valid.

But, this PR should be accepted first then could come #25029 and #23531

Actually I think I'm removing the test WITH r AS (SELECT * FROM r) SELECT * FROM r; because there is already a similar one in cte.sql and moving the WITH r AS (SELECT (SELECT * FROM r)) SELECT * FROM r; next to the existing one.

SparkQA · 2019-07-03T10:33:33Z

Test build #107168 has finished for PR 24831 at commit 837c776.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-07-03T12:23:14Z

Test build #107175 has finished for PR 24831 at commit d0c57c8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-07-03T13:06:59Z

Test build #107178 has finished for PR 24831 at commit 9ec6eaf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-07-04T00:22:23Z

sql/core/src/test/resources/sql-tests/results/cte.sql.out

+struct<1:int>
+-- !query 12 output
+1
+


This result is different from the pg one;

postgres=# WITH postgres-# t AS (SELECT 1), postgres-# t2 AS ( postgres(# WITH t AS (SELECT 2) postgres(# SELECT * FROM t postgres(# ) postgres-# SELECT * FROM t2; ?column? ---------- 2 (1 row)

This will be address in the following #25029?

I also agree that this is inevitable in this PR. (cc @gatorsmile ).

Yes, after #25029 it will return 2 (https://github.com/apache/spark/pull/25029/files#diff-fc515a5db268d29b08b80f5eb8202026R145)

dongjoon-hyun

+1, LGTM. Thank you, @peter-toth , @maropu , @mgaido91 .
The original PR is split into two (this and #25029) according to the review comment.
This is a new feature at Spark 3.0.0 and will be consistent with PostgreSQL soon.
Merged to master to move forward.

peter-toth · 2019-07-04T14:40:33Z

Thanks @dongjoon-hyun, @maropu, @mgaido91 for the review! I will prepare #25029 for review soon.

This was referenced Jun 11, 2019

[SPARK-19799] Support WITH clause (CTE) in subqueries #22936

Closed

[SPARK-28002][SQL] Support WITH clause column aliases #24842

Closed

dongjoon-hyun added the IMPROVEMENT label Jun 12, 2019

peter-toth force-pushed the SPARK-19799-2 branch from 741c727 to 0b516fc Compare June 13, 2019 09:25

dongjoon-hyun added SQL and removed IMPROVEMENT labels Jun 14, 2019

peter-toth force-pushed the SPARK-19799-2 branch from 0b516fc to ca27852 Compare June 15, 2019 10:58

peter-toth force-pushed the SPARK-19799-2 branch from ca27852 to d76a265 Compare June 22, 2019 16:01

This was referenced Jun 25, 2019

[WIP][SPARK-24497][SQL] Support recursive SQL query with adaptive replanning #23531

Closed

[SPARK-28034][SQL][TEST] Port with.sql #24860

Closed

maropu requested changes Jul 1, 2019

View reviewed changes

mgaido91 reviewed Jul 1, 2019

View reviewed changes

maropu reviewed Jul 1, 2019

View reviewed changes

sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql Outdated Show resolved Hide resolved

maropu reviewed Jul 1, 2019

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala Outdated Show resolved Hide resolved

maropu reviewed Jul 1, 2019

View reviewed changes

[SPARK-19799][SQL] Support WITH clause in subqueries

cff4bec

peter-toth force-pushed the SPARK-19799-2 branch from 7d69105 to cff4bec Compare July 2, 2019 06:07

peter-toth changed the title ~~[SPARK-19799][SQL] Better support for WITH clause~~ [SPARK-19799][SQL] Support WITH clause in subqueries Jul 2, 2019

peter-toth mentioned this pull request Jul 2, 2019

[SPARK-28228][SQL] Fix substitution order of nested WITH clauses #25029

Closed

minor fix

85e39d4

mgaido91 reviewed Jul 2, 2019

View reviewed changes

fix review findings

837c776

mgaido91 reviewed Jul 3, 2019

View reviewed changes

peter-toth added 2 commits July 3, 2019 11:09

fix review findings 2

d0c57c8

remove duplicate test

9ec6eaf

maropu reviewed Jul 4, 2019

View reviewed changes

dongjoon-hyun approved these changes Jul 4, 2019

View reviewed changes

dongjoon-hyun closed this in cad440d Jul 4, 2019

[SPARK-19799][SQL] Support WITH clause in subqueries #24831

[SPARK-19799][SQL] Support WITH clause in subqueries #24831

Uh oh!

Conversation

peter-toth commented Jun 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dongjoon-hyun commented Jun 10, 2019

Uh oh!

SparkQA commented Jun 10, 2019

Uh oh!

dongjoon-hyun commented Jun 10, 2019

Uh oh!

SparkQA commented Jun 13, 2019

Uh oh!

SparkQA commented Jun 15, 2019

Uh oh!

SparkQA commented Jun 22, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Jul 1, 2019

Uh oh!

SparkQA commented Jul 1, 2019

Uh oh!

SparkQA commented Jul 1, 2019

Uh oh!

SparkQA commented Jul 1, 2019

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented Jul 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peter-toth commented Jul 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maropu commented Jul 2, 2019

Uh oh!

peter-toth commented Jul 2, 2019

Uh oh!

SparkQA commented Jul 2, 2019

Uh oh!

SparkQA commented Jul 2, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peter-toth Jul 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peter-toth Jul 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

peter-toth commented Jun 10, 2019 •

edited

Loading

maropu commented Jul 2, 2019 •

edited

Loading

peter-toth commented Jul 2, 2019 •

edited

Loading

peter-toth Jul 3, 2019 •

edited

Loading

peter-toth Jul 3, 2019 •

edited

Loading

peter-toth Jul 3, 2019 •

edited

Loading