-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12719][SQL] SQL generation support for Generate #11696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #53071 has finished for PR 11696 at commit
|
|
test this please |
|
Test build #53075 has finished for PR 11696 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Just for my understanding, do we miss some case now that we want to improve upon ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to SQL standard, the operator order should be SELECT ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY ... LIMIT ..., we should re-order operators to make them in standard order in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Thank you !!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's put your comment actually in the code so we know why we need to re-order the operators
|
@cloud-fan Thank you. LGTM |
|
LGTM. Thank you for your work! @cloud-fan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just inline this since it is used only once
|
@cloud-fan the logic looks good to me. I left some comments -- try make this understandable (by adding more comments or renaming some functions) for people that haven't spent a lot of time understanding this code. |
|
OK. I have merged https://github.com/apache/spark/pull/11658/commits. Let me rebase this one to just keep the top commit. |
|
#11768 is the commit for generator. To avoid of destroying @cloud-fan's branch by any accident, I create that new PR. |
|
Test build #53402 has finished for PR 11696 at commit
|
|
Test build #53404 has finished for PR 11696 at commit
|
|
I'm going to merge it to unblock following works, will address comments of @yhuai and @liancheng if you have any. |
| } | ||
|
|
||
| private def generateToSQL(g: Generate): String = { | ||
| val columnAliases = g.generatorOutput.map(_.sql).mkString(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Please add a space after ,
|
Sorry for the late review. LGTM except for two minor comments. |
…while generate SQL ## What changes were proposed in this pull request? We only need to make sub-query names unique every time we generate a SQL string, but not all the time. This PR moves the `newSubqueryName` method to `class SQLBuilder` and remove `object SQLBuilder`. also addressed 2 minor comments in #11696 ## How was this patch tested? existing tests. Author: Wenchen Fan <[email protected]> Closes #11783 from cloud-fan/tmp.
PR #11696 introduced a complex pattern match that broke Scala 2.10 match unreachability check and caused build failure. This PR fixes this issue by expanding this pattern match into several simpler ones. Note that tuning or turning off `-Dscalac.patmat.analysisBudget` doesn't work for this case. Compilation against Scala 2.10 Author: tedyu <[email protected]> Closes #11798 from yy2016/master.
## What changes were proposed in this pull request? This PR adds SQL generation support for `Generate` operator. It always converts `Generate` operator into `LATERAL VIEW` format as there are many limitations to put UDTF in project list. This PR is based on apache#11658, please see the last commit to review the real changes. Thanks dilipbiswal for his initial work! Takes over apache#11596 ## How was this patch tested? new tests in `LogicalPlanToSQLSuite` Author: Wenchen Fan <[email protected]> Closes apache#11696 from cloud-fan/generate.
…while generate SQL ## What changes were proposed in this pull request? We only need to make sub-query names unique every time we generate a SQL string, but not all the time. This PR moves the `newSubqueryName` method to `class SQLBuilder` and remove `object SQLBuilder`. also addressed 2 minor comments in apache#11696 ## How was this patch tested? existing tests. Author: Wenchen Fan <[email protected]> Closes apache#11783 from cloud-fan/tmp.
PR apache#11696 introduced a complex pattern match that broke Scala 2.10 match unreachability check and caused build failure. This PR fixes this issue by expanding this pattern match into several simpler ones. Note that tuning or turning off `-Dscalac.patmat.analysisBudget` doesn't work for this case. Compilation against Scala 2.10 Author: tedyu <[email protected]> Closes apache#11798 from yy2016/master.
What changes were proposed in this pull request?
This PR adds SQL generation support for
Generateoperator. It always convertsGenerateoperator intoLATERAL VIEWformat as there are many limitations to put UDTF in project list.This PR is based on #11658, please see the last commit to review the real changes.
Thanks @dilipbiswal for his initial work! Takes over #11596
How was this patch tested?
new tests in
LogicalPlanToSQLSuite