Skip to content

Conversation

@TJX2014
Copy link
Contributor

@TJX2014 TJX2014 commented Jul 19, 2020

What changes were proposed in this pull request?

Add CTE hint resolve in org.apache.spark.sql.catalyst.analysis.ResolveHints.ResolveJoinStrategyHints#apply
Add a UT in org.apache.spark.sql.test.SQLTestUtils#test

Why are the changes needed?

Branch 2.4, when resolve CTE in org.apache.spark.sql.catalyst.analysis.Analyzer.CTESubstitution, we have a chance executeSameContext to apply all rules to CTE include hint resolve.
Branch 3.0, because CTESubstitution is moved to a separated class, we miss the feature as follow:
`
scala> sql("create temporary view t as select 1 as id")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("with cte as (select /*+ BROADCAST(id) */ id from t) select id from cte")
org.apache.spark.sql.AnalysisException: cannot resolve 'id' given input columns: [cte.id]; line 1 pos 59;
'Project ['id]
+- SubqueryAlias cte
+- Project [id#0]
+- SubqueryAlias t
+- Project [1 AS id#0]
+- OneRowRelation
`

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@TJX2014
Copy link
Contributor Author

TJX2014 commented Jul 19, 2020

Same issue but different patch way: https://github.com/apache/spark/pull/29062

@TJX2014
Copy link
Contributor Author

TJX2014 commented Jul 19, 2020

cc @viirya @maropu @LantaoJin

@dongjoon-hyun
Copy link
Member

Hi, @TJX2014 . Please run dev/scalastyle and fix the Scala style violation please. As you see on GitHub Action result, everything is failing.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revisit the PR title and test case name.

  • cte hint regression doesn't make any sense because it's too general. All bugs are usually a regression, aren't they?
  • Apache Jira issue has Affected Versions. Please focus on the PR content not something like 2.4 to 3.0 in the PR title.

@TJX2014 TJX2014 changed the title [SPARK-32347][SQL] Fix CTE hint regression issue from 2.4 to 3.0 [SPARK-32347][SQL] Hint in CTE should be resolved in Hints batch rule Jul 19, 2020
test("SPARK-32347: cte hint should be resolved in Hints batch rule") {
withTempView("t") {
sql("create temporary view t as select 1 as id")
sql("with cte as (select /*+ BROADCAST(id) */ id from t) select id from cte")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case, could you check that the hist is correctly applied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I need to check the hist, seems something wrong with the patch.

hintErrorHandler.hintRelationsNotFound(h.name, h.parameters, unmatchedIdents)
applied
}
case With(child, relations) => resolveCTEHint(child,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch looks like it is specifically designed to fix With comparing to #29062. I am open to more comments.

@cloud-fan
Copy link
Contributor

cannot resolve 'id' given input columns: [cte.id];

Do you know how this happens? The id column is there.

@TJX2014
Copy link
Contributor Author

TJX2014 commented Jul 22, 2020

cannot resolve 'id' given input columns: [cte.id];

Do you know how this happens? The id column is there.

Because the hint resolve ignore hint in CTE, and the id in cte is remained unresolved. While apply CTESubstitution, in branch 2.4, we have a chance to executeSameContext of Analyzer to apply all rules to CTE include hint resolve.
But in Branch 3.0, because CTESubstitution is moved to a separated class, so we miss the feature.

Comment on lines +177 to +178
case u: UnresolvedRelation =>
cteRelations.find(x => resolver(x._1, u.tableName)).map(_._2).getOrElse(u)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch will occur stackoverflow when cte table name is same as table in cte as follows:
sql("create temporary view t as select 1 as id")
sql("with t as (select /*+ BROADCAST(id) */ id from t) select id from t")
@cloud-fan Could you please help me find a way to pass this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked at #29062 ? Seems easier to just run CTE substitution in the very beginning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, put the CTE substitution in the very beginning is an easier way.

@TJX2014 TJX2014 closed this Jul 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants