-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions #30052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ssions ### What changes were proposed in this pull request? For queries with multiple foldable distinct columns, since they will be eliminated during execution, it's not mandatory to let `RewriteDistinctAggregates` handle this case. And in the current code, `RewriteDistinctAggregates` *dose* miss some "aggregating with multiple foldable distinct expressions" cases. For example: `select count(distinct 2), count(distinct 2, 3)` will be missed. But in the planner, this will trigger an error that "multiple distinct expressions" are not allowed. As the foldable distinct columns can be eliminated finally, we can allow this in the aggregation planner check. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added test case Closes apache#29607 from linhongliu-db/SPARK-32761. Authored-by: Linhong Liu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a410658)
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #129824 has finished for PR 30052 at commit
|
|
@cloud-fan could you please take a look? |
|
thanks, merging to 3.0! |
…expressions ### What changes were proposed in this pull request? For queries with multiple foldable distinct columns, since they will be eliminated during execution, it's not mandatory to let `RewriteDistinctAggregates` handle this case. And in the current code, `RewriteDistinctAggregates` *dose* miss some "aggregating with multiple foldable distinct expressions" cases. For example: `select count(distinct 2), count(distinct 2, 3)` will be missed. But in the planner, this will trigger an error that "multiple distinct expressions" are not allowed. As the foldable distinct columns can be eliminated finally, we can allow this in the aggregation planner check. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added test case Authored-by: Linhong Liu <linhong.liudatabricks.com> Signed-off-by: Wenchen Fan <wenchendatabricks.com> (cherry picked from commit a410658) Closes #30052 from linhongliu-db/SPARK-32761-3.0. Authored-by: Linhong Liu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…expressions ### What changes were proposed in this pull request? For queries with multiple foldable distinct columns, since they will be eliminated during execution, it's not mandatory to let `RewriteDistinctAggregates` handle this case. And in the current code, `RewriteDistinctAggregates` *dose* miss some "aggregating with multiple foldable distinct expressions" cases. For example: `select count(distinct 2), count(distinct 2, 3)` will be missed. But in the planner, this will trigger an error that "multiple distinct expressions" are not allowed. As the foldable distinct columns can be eliminated finally, we can allow this in the aggregation planner check. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added test case Authored-by: Linhong Liu <linhong.liudatabricks.com> Signed-off-by: Wenchen Fan <wenchendatabricks.com> (cherry picked from commit a410658) Closes apache#30052 from linhongliu-db/SPARK-32761-3.0. Authored-by: Linhong Liu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
For queries with multiple foldable distinct columns, since they will be eliminated during
execution, it's not mandatory to let
RewriteDistinctAggregateshandle this case. Andin the current code,
RewriteDistinctAggregatesdose miss some "aggregating withmultiple foldable distinct expressions" cases.
For example:
select count(distinct 2), count(distinct 2, 3)will be missed.But in the planner, this will trigger an error that "multiple distinct expressions" are not allowed.
As the foldable distinct columns can be eliminated finally, we can allow this in the aggregation
planner check.
Why are the changes needed?
bug fix
Does this PR introduce any user-facing change?
No
How was this patch tested?
added test case
Authored-by: Linhong Liu [email protected]
Signed-off-by: Wenchen Fan [email protected]
(cherry picked from commit a410658)