[SPARK-28228][SQL] Change the default behavior for name conflict in nested WITH clause #27454

xuanyuanking · 2020-02-04T15:27:52Z

What changes were proposed in this pull request?

This is a follow-up for #25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config spark.sql.legacy.ctePrecedence.enabled should be set explicitly for the expected behavior.

Why are the changes needed?

The original change might risky to end-users, it changes behavior silently.

Does this PR introduce any user-facing change?

Yes, change the config spark.sql.legacy.ctePrecedence.enabled as optional.

How was this patch tested?

New UT.

xuanyuanking · 2020-02-04T15:29:38Z

cc @cloud-fan

xuanyuanking · 2020-02-04T15:33:44Z

retest this please.

HyukjinKwon · 2020-02-05T00:52:55Z

retest this please

SparkQA · 2020-02-05T02:23:47Z

Test build #117869 has finished for PR 27454 at commit 53699d1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

docs/sql-migration-guide.md

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

cloud-fan · 2020-02-05T03:41:56Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

  def apply(plan: LogicalPlan): LogicalPlan = {
-    if (SQLConf.get.getConf(LEGACY_CTE_PRECEDENCE_ENABLED)) {
+    if (SQLConf.get.legacyCTEPrecedenceEnabled.isEmpty) {
+      if (hasNestedCTE(plan, inTraverse = false)) {


we should only fail if there are name conflicts. It's a bad UX if we blindly forbid nested CTE by default.

Thanks for the advice, done in a55e82d and change tests correspondingly.

SparkQA · 2020-02-05T08:05:02Z

Test build #117875 has finished for PR 27454 at commit f75b26a.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-02-05T23:17:46Z

Test build #117944 has finished for PR 27454 at commit a55e82d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

docs/sql-migration-guide.md

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

cloud-fan · 2020-02-06T06:59:57Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

+        val newNames = relations.map {
+          case (cteName, _) =>
+            if (cteNames.contains(cteName)) {
+              return true


we can throw exception here so that we know which name is conflicting.

Thanks, done in c03920a.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

cloud-fan · 2020-02-06T07:05:16Z

sql/core/src/test/resources/sql-tests/inputs/cte-nonlegacy.sql

+
+-- CTE non-legacy substitution
+SET spark.sql.legacy.ctePrecedence.enabled=false;
+


we can use --IMPORT cte.sql to include the existing test

cloud-fan · 2020-02-06T07:07:24Z

sql/core/src/test/resources/sql-tests/inputs/cte-nonlegacy.sql

+create temporary view t2 as select * from values 0, 1 as t(id);
+
+-- CTE non-legacy substitution
+SET spark.sql.legacy.ctePrecedence.enabled=false;


use --SET spark.sql.legacy.ctePrecedence.enabled=false, so that this won't be treated as a query and appear in the result query.

cloud-fan · 2020-02-06T12:06:27Z

sql/core/src/test/resources/sql-tests/inputs/cte-nonlegacy.sql

@@ -0,0 +1,2 @@
+--SET spark.sql.legacy.ctePrecedence.enabled = false
+--IMPORT cte.sql


we should do the same thing for cte-legacy.sql. This can be done in a followup.

Copy that, will do it later.

SparkQA · 2020-02-06T14:54:16Z

Test build #117988 has finished for PR 27454 at commit c03920a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-02-07T00:55:58Z

cc @peter-toth

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

dongjoon-hyun · 2020-02-07T01:01:12Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala


  val LEGACY_CTE_PRECEDENCE_ENABLED = buildConf("spark.sql.legacy.ctePrecedence.enabled")
    .internal()
    .doc("When true, outer CTE definitions takes precedence over inner definitions.")


Now, it became three-state conf. Shall we mentioned more about false and empty?

Sounds we should.

Sure, done in 7249404.

docs/sql-migration-guide.md

viirya

Overall looks good. Only minor comment.

peter-toth · 2020-02-07T09:02:14Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

+              cteName
+            }
+        }.toSet
+        (w.innerChildren :+ child).foreach { p =>


This could be:

child.transformExpressions { case e: SubqueryExpression => assertNoNameConflictsInCTE(e.plan, inTraverse = true, cteNames ++ newNames) e } w.innerChildren.foreach { p => assertNoNameConflictsInCTE(p, inTraverse = true, cteNames ++ newNames) }

If you check CTE in subquery shadows outer test cases you will see that legacy (https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out#L113-L151) and new results (https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/cte.sql.out#L248-L286) are the same. We shouldn't give AnalysisException in those cases.

Thanks for the detailed checking! Yes, that makes sense, we should only give exception while legacy and non-legacy give different results. Done in 7249404

SparkQA · 2020-02-07T18:46:57Z

Test build #118039 has finished for PR 27454 at commit 7249404.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

SparkQA · 2020-02-08T05:48:58Z

Test build #118053 has finished for PR 27454 at commit ebd337b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

peter-toth

LGTM, minor comment on legacy flag docs.

dongjoon-hyun

+1, LGTM. Thank you, all!
Merged to master/3.0.

…ested WITH clause ### What changes were proposed in this pull request? This is a follow-up for #25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config `spark.sql.legacy.ctePrecedence.enabled` should be set explicitly for the expected behavior. ### Why are the changes needed? The original change might risky to end-users, it changes behavior silently. ### Does this PR introduce any user-facing change? Yes, change the config `spark.sql.legacy.ctePrecedence.enabled` as optional. ### How was this patch tested? New UT. Closes #27454 from xuanyuanking/SPARK-28228-follow. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 3db3e39) Signed-off-by: Dongjoon Hyun <[email protected]>

xuanyuanking · 2020-02-10T01:27:38Z

Thanks all for the review.

…ested WITH clause ### What changes were proposed in this pull request? This is a follow-up for apache#25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config `spark.sql.legacy.ctePrecedence.enabled` should be set explicitly for the expected behavior. ### Why are the changes needed? The original change might risky to end-users, it changes behavior silently. ### Does this PR introduce any user-facing change? Yes, change the config `spark.sql.legacy.ctePrecedence.enabled` as optional. ### How was this patch tested? New UT. Closes apache#27454 from xuanyuanking/SPARK-28228-follow. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

change the default behavior

53699d1

fix influence test cases

f75b26a

cloud-fan reviewed Feb 5, 2020

View reviewed changes

docs/sql-migration-guide.md Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 5, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 5, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 5, 2020

View reviewed changes

dongjoon-hyun added the SQL label Feb 5, 2020

address comment

a55e82d

xuanyuanking changed the title ~~[SPARK-28228][SQL] Change the default behavior for nested WITH clause~~ [SPARK-28228][SQL] Change the default behavior for name conflict in nested WITH clause Feb 5, 2020