-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28228][SQL] Change the default behavior for name conflict in nested WITH clause #27454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan |
|
retest this please. |
|
retest this please |
|
Test build #117869 has finished for PR 27454 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
| def apply(plan: LogicalPlan): LogicalPlan = { | ||
| if (SQLConf.get.getConf(LEGACY_CTE_PRECEDENCE_ENABLED)) { | ||
| if (SQLConf.get.legacyCTEPrecedenceEnabled.isEmpty) { | ||
| if (hasNestedCTE(plan, inTraverse = false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should only fail if there are name conflicts. It's a bad UX if we blindly forbid nested CTE by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the advice, done in a55e82d and change tests correspondingly.
|
Test build #117875 has finished for PR 27454 at commit
|
|
Test build #117944 has finished for PR 27454 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
| val newNames = relations.map { | ||
| case (cteName, _) => | ||
| if (cteNames.contains(cteName)) { | ||
| return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can throw exception here so that we know which name is conflicting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done in c03920a.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Show resolved
Hide resolved
|
|
||
| -- CTE non-legacy substitution | ||
| SET spark.sql.legacy.ctePrecedence.enabled=false; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use --IMPORT cte.sql to include the existing test
| create temporary view t2 as select * from values 0, 1 as t(id); | ||
|
|
||
| -- CTE non-legacy substitution | ||
| SET spark.sql.legacy.ctePrecedence.enabled=false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use --SET spark.sql.legacy.ctePrecedence.enabled=false, so that this won't be treated as a query and appear in the result query.
| @@ -0,0 +1,2 @@ | |||
| --SET spark.sql.legacy.ctePrecedence.enabled = false | |||
| --IMPORT cte.sql | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should do the same thing for cte-legacy.sql. This can be done in a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy that, will do it later.
|
Test build #117988 has finished for PR 27454 at commit
|
|
cc @peter-toth |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
|
|
||
| val LEGACY_CTE_PRECEDENCE_ENABLED = buildConf("spark.sql.legacy.ctePrecedence.enabled") | ||
| .internal() | ||
| .doc("When true, outer CTE definitions takes precedence over inner definitions.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, it became three-state conf. Shall we mentioned more about false and empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done in 7249404.
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Only minor comment.
| cteName | ||
| } | ||
| }.toSet | ||
| (w.innerChildren :+ child).foreach { p => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be:
child.transformExpressions {
case e: SubqueryExpression =>
assertNoNameConflictsInCTE(e.plan, inTraverse = true, cteNames ++ newNames)
e
}
w.innerChildren.foreach { p =>
assertNoNameConflictsInCTE(p, inTraverse = true, cteNames ++ newNames)
}
If you check CTE in subquery shadows outer test cases you will see that legacy (https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out#L113-L151) and new results (https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/results/cte.sql.out#L248-L286) are the same. We shouldn't give AnalysisException in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed checking! Yes, that makes sense, we should only give exception while legacy and non-legacy give different results. Done in 7249404
|
Test build #118039 has finished for PR 27454 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
|
Test build #118053 has finished for PR 27454 at commit
|
peter-toth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor comment on legacy flag docs.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, all!
Merged to master/3.0.
…ested WITH clause ### What changes were proposed in this pull request? This is a follow-up for #25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config `spark.sql.legacy.ctePrecedence.enabled` should be set explicitly for the expected behavior. ### Why are the changes needed? The original change might risky to end-users, it changes behavior silently. ### Does this PR introduce any user-facing change? Yes, change the config `spark.sql.legacy.ctePrecedence.enabled` as optional. ### How was this patch tested? New UT. Closes #27454 from xuanyuanking/SPARK-28228-follow. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 3db3e39) Signed-off-by: Dongjoon Hyun <[email protected]>
|
Thanks all for the review. |
…ested WITH clause ### What changes were proposed in this pull request? This is a follow-up for apache#25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config `spark.sql.legacy.ctePrecedence.enabled` should be set explicitly for the expected behavior. ### Why are the changes needed? The original change might risky to end-users, it changes behavior silently. ### Does this PR introduce any user-facing change? Yes, change the config `spark.sql.legacy.ctePrecedence.enabled` as optional. ### How was this patch tested? New UT. Closes apache#27454 from xuanyuanking/SPARK-28228-follow. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This is a follow-up for #25029, in this PR we throw an AnalysisException when name conflict is detected in nested WITH clause. In this way, the config
spark.sql.legacy.ctePrecedence.enabledshould be set explicitly for the expected behavior.Why are the changes needed?
The original change might risky to end-users, it changes behavior silently.
Does this PR introduce any user-facing change?
Yes, change the config
spark.sql.legacy.ctePrecedence.enabledas optional.How was this patch tested?
New UT.