-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34952][SQL][FOLLOWUP] Simplify JDBC aggregate pushdown #33579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
| case agg: AggregateExpression => agg | ||
| // Do not push down duplicated aggregate expressions. For example, | ||
| // `SELECT max(a) + 1, max(a) + 2 FROM ...`, we should only push down one | ||
| // `sum(a)` to the data source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: max(a)?
|
java linter failed because of Thanks a lot for improving the code! Really appreciate your help! |
| - JDBCOptions.JDBC_UPPER_BOUND + | ||
| (JDBCOptions.JDBC_QUERY_STRING -> aggQuery)) | ||
| try { | ||
| finalSchema = JDBCRDD.resolveTable(jdbcOptionsWithAggQuery) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is a good change.
|
Test build #141848 has finished for PR 33579 at commit
|
|
CI pending. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Thanks. Merging master/3.2. |
### What changes were proposed in this pull request? This is a followup of #33352 , to simplify the JDBC aggregate pushdown: 1. We should get the schema of the aggregate query by asking the JDBC server, instead of calculating it by ourselves. This can simplify the code a lot, and is also more robust: the data type of SUM may vary in different databases, it's fragile to assume they are always the same as Spark. 2. because of 1, now we can remove the `dataType` property from the public `Sum` expression. This PR also contains some small improvements: 1. Spark should deduplicate the aggregate expressions before pushing them down. 2. Improve the `toString` of public aggregate expressions to make them more SQL. ### Why are the changes needed? code and API simplification ### Does this PR introduce _any_ user-facing change? this API is not released yet. ### How was this patch tested? existing tests Closes #33579 from cloud-fan/dsv2. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Liang-Chi Hsieh <[email protected]> (cherry picked from commit 387a251) Signed-off-by: Liang-Chi Hsieh <[email protected]>
|
Test build #141871 has finished for PR 33579 at commit
|
What changes were proposed in this pull request?
This is a followup of #33352 , to simplify the JDBC aggregate pushdown:
dataTypeproperty from the publicSumexpression.This PR also contains some small improvements:
toStringof public aggregate expressions to make them more SQL.Why are the changes needed?
code and API simplification
Does this PR introduce any user-facing change?
this API is not released yet.
How was this patch tested?
existing tests