Skip to content

Conversation

@LantaoJin
Copy link
Contributor

@LantaoJin LantaoJin commented Jul 10, 2020

What changes were proposed in this pull request?

This PR is to move Substitution rule before Hints rule in Analyzer to avoid hint in CTE not working.

Why are the changes needed?

Below SQL in Spark3.0 will throw AnalysisException, but it works in Spark2.x

WITH cte AS (SELECT /*+ REPARTITION(3) */ T.id, T.data FROM $t1 T)
SELECT cte.id, cte.data FROM cte
Failed to analyze query: org.apache.spark.sql.AnalysisException: cannot resolve '`cte.id`' given input columns: [cte.data, cte.id]; line 3 pos 7;
'Project ['cte.id, 'cte.data]
+- SubqueryAlias cte
   +- Project [id#21L, data#22]
      +- SubqueryAlias T
         +- SubqueryAlias testcat.ns1.ns2.tbl
            +- RelationV2[id#21L, data#22] testcat.ns1.ns2.tbl

'Project ['cte.id, 'cte.data]
+- SubqueryAlias cte
   +- Project [id#21L, data#22]
      +- SubqueryAlias T
         +- SubqueryAlias testcat.ns1.ns2.tbl
            +- RelationV2[id#21L, data#22] testcat.ns1.ns2.tbl

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add a unit test

Comment on lines 202 to -214
lazy val batches: Seq[Batch] = Seq(
Batch("Substitution", fixedPoint,
CTESubstitution,
WindowsSubstitution,
EliminateUnions,
new SubstituteUnresolvedOrdinals(conf)),
Batch("Disable Hints", Once,
new ResolveHints.DisableHints(conf)),
Batch("Hints", fixedPoint,
new ResolveHints.ResolveJoinStrategyHints(conf),
new ResolveHints.ResolveCoalesceHints(conf)),
Batch("Simple Sanity Check", Once,
LookupFunctions),
Batch("Substitution", fixedPoint,
CTESubstitution,
WindowsSubstitution,
EliminateUnions,
new SubstituteUnresolvedOrdinals(conf)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems cool : )

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125553 has finished for PR 29062 at commit b183cba.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125579 has started for PR 29062 at commit adc2b59.

Seq("Window function row_number() requires an OVER clause."))
}

test("SPARK-32237: Hint in CTE") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add end-2-end tests, too, e.g., in SQLQueryTestSuite or SQLQuerySuite?

@maropu
Copy link
Member

maropu commented Jul 10, 2020

cc: @cloud-fan @viirya

@SparkQA
Copy link

SparkQA commented Jul 11, 2020

Test build #125662 has finished for PR 29062 at commit e361401.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@LantaoJin
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 11, 2020

Test build #125674 has finished for PR 29062 at commit e361401.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Jul 11, 2020

Can you keep the original PR template? Like "What changes were proposed in this pull request?" was removed. Please describe why the issue happened and how this fixes it in the PR description.

@LantaoJin
Copy link
Contributor Author

Can you keep the original PR template? Like "What changes were proposed in this pull request?" was removed. Please describe why the issue happened and how this fixes it in the PR description.

Done

CTESubstitution,
WindowsSubstitution,
EliminateUnions,
new SubstituteUnresolvedOrdinals(conf)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the order of analysis rules in branch-2.4. Substitution batch is after Hints too, like branch-3.0. Is there any change causing this issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to find which change causing it for me since 2.4 and 3.0 have many differences that I don't know.

@cloud-fan
Copy link
Contributor

the patch itself looks fine.

AnalysisException: cannot resolve 'cte.id' given input columns: [cte.data, cte.id];

Do you know why? I expect to see we just ignore the hint, but it reports "can't resolve column", which is pretty weird.

@LantaoJin
Copy link
Contributor Author

PS:

    val plan = With(
      Project(
        Seq(UnresolvedAttribute("cte.a")),
        UnresolvedRelation(TableIdentifier("cte"))
      ),
      Seq(
        (
          "cte",
          SubqueryAlias(
            AliasIdentifier("cte"),
            Project(testRelation.output, testRelation)
          )
        )
      )
    )
    assertAnalysisSuccess(plan)

above test case without hint only works in v3.0.0. So the test case I given both fail in v3.0.0 and v2.4.6

@LantaoJin
Copy link
Contributor Author

LantaoJin commented Jul 22, 2020

WITH cte AS (SELECT /*+ REPARTITION(3) */ * FROM t) SELECT * FROM cte

throws java.lang.IllegalStateException: Internal error: logical hint operator should have been removed during analysis at #25029, but throws AnalysisException: cannot resolve at #27454 (These two PRs are to fix one issue SPARK-28228). But between them, there are so many PRs. I can't tell which one lead to this change. @cloud-fan

@cloud-fan
Copy link
Contributor

The patch LGTM. It will be great if you can take a look at 2.4 and see if there is an easy fix to make your tests pass. We did a big refactor about CTE substitution in 3.0 and I don't know how hard it is to fix the bug in 2.4.

@dongjoon-hyun
Copy link
Member

cc @peter-toth , too.

@cloud-fan cloud-fan closed this in 182566b Jul 23, 2020
@cloud-fan
Copy link
Contributor

merged to master, thanks!

@LantaoJin can you send a backport to 3.0?

@LantaoJin
Copy link
Contributor Author

Thanks. I will file a backport PR.

cloud-fan pushed a commit that referenced this pull request Jul 24, 2020
### What changes were proposed in this pull request?
The backport of #29062

This PR is to move `Substitution` rule before `Hints` rule in `Analyzer` to avoid hint in CTE not working.

### Why are the changes needed?
Below SQL in Spark3.0 will throw AnalysisException, but it works in Spark2.x
```sql
WITH cte AS (SELECT /*+ REPARTITION(3) */ T.id, T.data FROM $t1 T)
SELECT cte.id, cte.data FROM cte
```
```
Failed to analyze query: org.apache.spark.sql.AnalysisException: cannot resolve '`cte.id`' given input columns: [cte.data, cte.id]; line 3 pos 7;
'Project ['cte.id, 'cte.data]
+- SubqueryAlias cte
   +- Project [id#21L, data#22]
      +- SubqueryAlias T
         +- SubqueryAlias testcat.ns1.ns2.tbl
            +- RelationV2[id#21L, data#22] testcat.ns1.ns2.tbl

'Project ['cte.id, 'cte.data]
+- SubqueryAlias cte
   +- Project [id#21L, data#22]
      +- SubqueryAlias T
         +- SubqueryAlias testcat.ns1.ns2.tbl
            +- RelationV2[id#21L, data#22] testcat.ns1.ns2.tbl
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add a unit test

Closes #29201 from LantaoJin/SPARK-32237_branch-3.0.

Lead-authored-by: LantaoJin <[email protected]>
Co-authored-by: Alan Jin <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil

lazy val batches: Seq[Batch] = Seq(
Batch("Substitution", fixedPoint,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we have to execute the batch Substitution before the batch ResolveHints? What is the root cause? I am unable to tell it from either PR description or the code comments. Could any of you explain it?

cc @maryannxue

Copy link
Contributor Author

@LantaoJin LantaoJin Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know which PR causes this regression problem since there is a big gap between 2.4 and 3.0 about CTE. I am not familiar with this part.
I fixed this by debuging. In 3.0, I found only the child of CTE could enter into the rule of ResolveCoalesceHints. For the above test case:

CTE [cte]
:  +- 'SubqueryAlias `cte`
:     +- 'UnresolvedHint REPARTITION, [3]
:        +- 'Project [*]
:           +- 'UnresolvedRelation `t`
+- 'Project [*]
   +- 'UnresolvedRelation `cte`

Only this child Project branch could enter into the rule of ResolveCoalesceHints

+- 'Project [*]
   +- 'UnresolvedRelation `cte`

sql("CREATE TABLE t USING PARQUET AS SELECT 1 AS id")
checkAnswer(
sql(s"""
|WITH cte AS (SELECT /*+ REPARTITION(3) */ * FROM t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as we have a hint in CTE, it will fail in 3.0? can you add more test cases?

Copy link
Contributor Author

@LantaoJin LantaoJin Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The same issue reported by multiple times in Jira with different use cases, like SPARK-32237, SPARK-32347, SPARK-32535, I will file a new PR to add more unit tests.

}
}

test("SPARK-32237: Hint in CTE") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like JoinHintSuite, we should have our own hint-specific suite. It can help us understand the test coverage of SQL HINT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will add a new suite.

cloud-fan pushed a commit that referenced this pull request Aug 10, 2020
### What changes were proposed in this pull request?
Add a new test suite `CTEHintSuite`

### Why are the changes needed?
This ticket is to address the below comments to help us understand the test coverage of SQL HINT for CTE.
#29062 (comment)
#29062 (comment)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add a test suite.

Closes #29359 from LantaoJin/SPARK-32537.

Authored-by: LantaoJin <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants