Skip to content

Conversation

@xuanyuanking
Copy link
Member

@xuanyuanking xuanyuanking commented Feb 4, 2020

What changes were proposed in this pull request?

This is a follow-up for #25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config spark.sql.legacy.ctePrecedence.enabled should be set explicitly for the expected behavior.

Why are the changes needed?

The original change might risky to end-users, it changes behavior silently.

Does this PR introduce any user-facing change?

Yes, change the config spark.sql.legacy.ctePrecedence.enabled as optional.

How was this patch tested?

New UT.

@xuanyuanking
Copy link
Member Author

cc @cloud-fan

@xuanyuanking
Copy link
Member Author

retest this please.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 5, 2020

Test build #117869 has finished for PR 27454 at commit 53699d1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

def apply(plan: LogicalPlan): LogicalPlan = {
if (SQLConf.get.getConf(LEGACY_CTE_PRECEDENCE_ENABLED)) {
if (SQLConf.get.legacyCTEPrecedenceEnabled.isEmpty) {
if (hasNestedCTE(plan, inTraverse = false)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should only fail if there are name conflicts. It's a bad UX if we blindly forbid nested CTE by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice, done in a55e82d and change tests correspondingly.

@SparkQA
Copy link

SparkQA commented Feb 5, 2020

Test build #117875 has finished for PR 27454 at commit f75b26a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking xuanyuanking changed the title [SPARK-28228][SQL] Change the default behavior for nested WITH clause [SPARK-28228][SQL] Change the default behavior for name conflict in nested WITH clause Feb 5, 2020
@SparkQA
Copy link

SparkQA commented Feb 5, 2020

Test build #117944 has finished for PR 27454 at commit a55e82d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val newNames = relations.map {
case (cteName, _) =>
if (cteNames.contains(cteName)) {
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can throw exception here so that we know which name is conflicting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done in c03920a.


-- CTE non-legacy substitution
SET spark.sql.legacy.ctePrecedence.enabled=false;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use --IMPORT cte.sql to include the existing test

create temporary view t2 as select * from values 0, 1 as t(id);

-- CTE non-legacy substitution
SET spark.sql.legacy.ctePrecedence.enabled=false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use --SET spark.sql.legacy.ctePrecedence.enabled=false, so that this won't be treated as a query and appear in the result query.

@@ -0,0 +1,2 @@
--SET spark.sql.legacy.ctePrecedence.enabled = false
--IMPORT cte.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should do the same thing for cte-legacy.sql. This can be done in a followup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy that, will do it later.

@SparkQA
Copy link

SparkQA commented Feb 6, 2020

Test build #117988 has finished for PR 27454 at commit c03920a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

cc @peter-toth


val LEGACY_CTE_PRECEDENCE_ENABLED = buildConf("spark.sql.legacy.ctePrecedence.enabled")
.internal()
.doc("When true, outer CTE definitions takes precedence over inner definitions.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, it became three-state conf. Shall we mentioned more about false and empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds we should.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done in 7249404.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Only minor comment.

cteName
}
}.toSet
(w.innerChildren :+ child).foreach { p =>
Copy link
Contributor

@peter-toth peter-toth Feb 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be:

        child.transformExpressions {
          case e: SubqueryExpression =>
            assertNoNameConflictsInCTE(e.plan, inTraverse = true, cteNames ++ newNames)
            e
        }
        w.innerChildren.foreach { p =>
          assertNoNameConflictsInCTE(p, inTraverse = true, cteNames ++ newNames)
        }

If you check CTE in subquery shadows outer test cases you will see that legacy (https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out#L113-L151) and new results (https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/cte.sql.out#L248-L286) are the same. We shouldn't give AnalysisException in those cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed checking! Yes, that makes sense, we should only give exception while legacy and non-legacy give different results. Done in 7249404

@SparkQA
Copy link

SparkQA commented Feb 7, 2020

Test build #118039 has finished for PR 27454 at commit 7249404.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 8, 2020

Test build #118053 has finished for PR 27454 at commit ebd337b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comment on legacy flag docs.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, all!
Merged to master/3.0.

dongjoon-hyun pushed a commit that referenced this pull request Feb 8, 2020
…ested WITH clause

### What changes were proposed in this pull request?
This is a follow-up for #25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config `spark.sql.legacy.ctePrecedence.enabled` should be set explicitly for the expected behavior.

### Why are the changes needed?
The original change might risky to end-users, it changes behavior silently.

### Does this PR introduce any user-facing change?
Yes, change the config `spark.sql.legacy.ctePrecedence.enabled` as optional.

### How was this patch tested?
New UT.

Closes #27454 from xuanyuanking/SPARK-28228-follow.

Authored-by: Yuanjian Li <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 3db3e39)
Signed-off-by: Dongjoon Hyun <[email protected]>
@xuanyuanking
Copy link
Member Author

Thanks all for the review.

@xuanyuanking xuanyuanking deleted the SPARK-28228-follow branch February 10, 2020 01:27
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…ested WITH clause

### What changes were proposed in this pull request?
This is a follow-up for apache#25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config `spark.sql.legacy.ctePrecedence.enabled` should be set explicitly for the expected behavior.

### Why are the changes needed?
The original change might risky to end-users, it changes behavior silently.

### Does this PR introduce any user-facing change?
Yes, change the config `spark.sql.legacy.ctePrecedence.enabled` as optional.

### How was this patch tested?
New UT.

Closes apache#27454 from xuanyuanking/SPARK-28228-follow.

Authored-by: Yuanjian Li <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants