-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32237][SQL] Resolve hint in CTE #29062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
Outdated
Show resolved
Hide resolved
| lazy val batches: Seq[Batch] = Seq( | ||
| Batch("Substitution", fixedPoint, | ||
| CTESubstitution, | ||
| WindowsSubstitution, | ||
| EliminateUnions, | ||
| new SubstituteUnresolvedOrdinals(conf)), | ||
| Batch("Disable Hints", Once, | ||
| new ResolveHints.DisableHints(conf)), | ||
| Batch("Hints", fixedPoint, | ||
| new ResolveHints.ResolveJoinStrategyHints(conf), | ||
| new ResolveHints.ResolveCoalesceHints(conf)), | ||
| Batch("Simple Sanity Check", Once, | ||
| LookupFunctions), | ||
| Batch("Substitution", fixedPoint, | ||
| CTESubstitution, | ||
| WindowsSubstitution, | ||
| EliminateUnions, | ||
| new SubstituteUnresolvedOrdinals(conf)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems cool : )
|
Test build #125553 has finished for PR 29062 at commit
|
|
Test build #125579 has started for PR 29062 at commit |
| Seq("Window function row_number() requires an OVER clause.")) | ||
| } | ||
|
|
||
| test("SPARK-32237: Hint in CTE") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add end-2-end tests, too, e.g., in SQLQueryTestSuite or SQLQuerySuite?
|
cc: @cloud-fan @viirya |
|
Test build #125662 has finished for PR 29062 at commit
|
|
retest this please |
|
Test build #125674 has finished for PR 29062 at commit
|
|
Can you keep the original PR template? Like "What changes were proposed in this pull request?" was removed. Please describe why the issue happened and how this fixes it in the PR description. |
Done |
| CTESubstitution, | ||
| WindowsSubstitution, | ||
| EliminateUnions, | ||
| new SubstituteUnresolvedOrdinals(conf)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the order of analysis rules in branch-2.4. Substitution batch is after Hints too, like branch-3.0. Is there any change causing this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to find which change causing it for me since 2.4 and 3.0 have many differences that I don't know.
|
the patch itself looks fine.
Do you know why? I expect to see we just ignore the hint, but it reports "can't resolve column", which is pretty weird. |
|
PS: val plan = With(
Project(
Seq(UnresolvedAttribute("cte.a")),
UnresolvedRelation(TableIdentifier("cte"))
),
Seq(
(
"cte",
SubqueryAlias(
AliasIdentifier("cte"),
Project(testRelation.output, testRelation)
)
)
)
)
assertAnalysisSuccess(plan)above test case without hint only works in v3.0.0. So the test case I given both fail in v3.0.0 and v2.4.6 spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala Line 899 in e361401
|
throws |
|
The patch LGTM. It will be great if you can take a look at 2.4 and see if there is an easy fix to make your tests pass. We did a big refactor about CTE substitution in 3.0 and I don't know how hard it is to fix the bug in 2.4. |
|
cc @peter-toth , too. |
|
merged to master, thanks! @LantaoJin can you send a backport to 3.0? |
|
Thanks. I will file a backport PR. |
### What changes were proposed in this pull request? The backport of #29062 This PR is to move `Substitution` rule before `Hints` rule in `Analyzer` to avoid hint in CTE not working. ### Why are the changes needed? Below SQL in Spark3.0 will throw AnalysisException, but it works in Spark2.x ```sql WITH cte AS (SELECT /*+ REPARTITION(3) */ T.id, T.data FROM $t1 T) SELECT cte.id, cte.data FROM cte ``` ``` Failed to analyze query: org.apache.spark.sql.AnalysisException: cannot resolve '`cte.id`' given input columns: [cte.data, cte.id]; line 3 pos 7; 'Project ['cte.id, 'cte.data] +- SubqueryAlias cte +- Project [id#21L, data#22] +- SubqueryAlias T +- SubqueryAlias testcat.ns1.ns2.tbl +- RelationV2[id#21L, data#22] testcat.ns1.ns2.tbl 'Project ['cte.id, 'cte.data] +- SubqueryAlias cte +- Project [id#21L, data#22] +- SubqueryAlias T +- SubqueryAlias testcat.ns1.ns2.tbl +- RelationV2[id#21L, data#22] testcat.ns1.ns2.tbl ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add a unit test Closes #29201 from LantaoJin/SPARK-32237_branch-3.0. Lead-authored-by: LantaoJin <[email protected]> Co-authored-by: Alan Jin <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
| val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil | ||
|
|
||
| lazy val batches: Seq[Batch] = Seq( | ||
| Batch("Substitution", fixedPoint, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we have to execute the batch Substitution before the batch ResolveHints? What is the root cause? I am unable to tell it from either PR description or the code comments. Could any of you explain it?
cc @maryannxue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know which PR causes this regression problem since there is a big gap between 2.4 and 3.0 about CTE. I am not familiar with this part.
I fixed this by debuging. In 3.0, I found only the child of CTE could enter into the rule of ResolveCoalesceHints. For the above test case:
CTE [cte]
: +- 'SubqueryAlias `cte`
: +- 'UnresolvedHint REPARTITION, [3]
: +- 'Project [*]
: +- 'UnresolvedRelation `t`
+- 'Project [*]
+- 'UnresolvedRelation `cte`
Only this child Project branch could enter into the rule of ResolveCoalesceHints
+- 'Project [*]
+- 'UnresolvedRelation `cte`
| sql("CREATE TABLE t USING PARQUET AS SELECT 1 AS id") | ||
| checkAnswer( | ||
| sql(s""" | ||
| |WITH cte AS (SELECT /*+ REPARTITION(3) */ * FROM t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as we have a hint in CTE, it will fail in 3.0? can you add more test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The same issue reported by multiple times in Jira with different use cases, like SPARK-32237, SPARK-32347, SPARK-32535, I will file a new PR to add more unit tests.
| } | ||
| } | ||
|
|
||
| test("SPARK-32237: Hint in CTE") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like JoinHintSuite, we should have our own hint-specific suite. It can help us understand the test coverage of SQL HINT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will add a new suite.
### What changes were proposed in this pull request? Add a new test suite `CTEHintSuite` ### Why are the changes needed? This ticket is to address the below comments to help us understand the test coverage of SQL HINT for CTE. #29062 (comment) #29062 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add a test suite. Closes #29359 from LantaoJin/SPARK-32537. Authored-by: LantaoJin <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR is to move
Substitutionrule beforeHintsrule inAnalyzerto avoid hint in CTE not working.Why are the changes needed?
Below SQL in Spark3.0 will throw AnalysisException, but it works in Spark2.x
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add a unit test