-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule #29156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
Same issue but different patch way: https://github.com/apache/spark/pull/29062 |
|
Hi, @TJX2014 . Please run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revisit the PR title and test case name.
cte hint regressiondoesn't make any sense because it's too general. All bugs are usually a regression, aren't they?- Apache Jira issue has
Affected Versions. Please focus on the PR content not something like2.4to3.0in the PR title.
| test("SPARK-32347: cte hint should be resolved in Hints batch rule") { | ||
| withTempView("t") { | ||
| sql("create temporary view t as select 1 as id") | ||
| sql("with cte as (select /*+ BROADCAST(id) */ id from t) select id from cte") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case, could you check that the hist is correctly applied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I need to check the hist, seems something wrong with the patch.
| hintErrorHandler.hintRelationsNotFound(h.name, h.parameters, unmatchedIdents) | ||
| applied | ||
| } | ||
| case With(child, relations) => resolveCTEHint(child, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch looks like it is specifically designed to fix With comparing to #29062. I am open to more comments.
Do you know how this happens? The |
Because the hint resolve ignore hint in CTE, and the |
| case u: UnresolvedRelation => | ||
| cteRelations.find(x => resolver(x._1, u.tableName)).map(_._2).getOrElse(u) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch will occur stackoverflow when cte table name is same as table in cte as follows:
sql("create temporary view t as select 1 as id")
sql("with t as (select /*+ BROADCAST(id) */ id from t) select id from t")
@cloud-fan Could you please help me find a way to pass this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at #29062 ? Seems easier to just run CTE substitution in the very beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, put the CTE substitution in the very beginning is an easier way.
What changes were proposed in this pull request?
Add CTE hint resolve in
org.apache.spark.sql.catalyst.analysis.ResolveHints.ResolveJoinStrategyHints#applyAdd a UT in
org.apache.spark.sql.test.SQLTestUtils#testWhy are the changes needed?
Branch 2.4, when resolve CTE in
org.apache.spark.sql.catalyst.analysis.Analyzer.CTESubstitution, we have a chanceexecuteSameContextto apply all rules to CTE include hint resolve.Branch 3.0, because
CTESubstitutionis moved to a separated class, we miss the feature as follow:`
scala> sql("create temporary view t as select 1 as id")
res0: org.apache.spark.sql.DataFrame = []
scala> sql("with cte as (select /*+ BROADCAST(id) */ id from t) select id from cte")
org.apache.spark.sql.AnalysisException: cannot resolve '
id' given input columns: [cte.id]; line 1 pos 59;'Project ['id]
+- SubqueryAlias cte
+- Project [id#0]
+- SubqueryAlias t
+- Project [1 AS id#0]
+- OneRowRelation
`
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test.