-
Notifications
You must be signed in to change notification settings - Fork 29k
revert SPARK-29663 and SPARK-29688 #27619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
yaooqinn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Test build #118633 has finished for PR 27619 at commit
|
|
Test build #118642 has finished for PR 27619 at commit
|
|
Merging to master/branch-3.0. |
### What changes were proposed in this pull request? This PR reverts #26325 and #26347 ### Why are the changes needed? When we do sum/avg, we need a wider type of input to hold the sum value, to reduce the possibility of overflow. For example, we use long to hold the sum of integral inputs, use double to hold the sum of float/double. However, we don't have a wider type of interval. Also the semantic is unclear: what if the days field overflows but the months field doesn't? Currently the avg of `1 month` and `2 month` is `1 month 15 days`, which assumes 1 month has 30 days and we should avoid this assumption. ### Does this PR introduce any user-facing change? yes, remove 2 features added in 3.0 ### How was this patch tested? N/A Closes #27619 from cloud-fan/revert. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: herman <[email protected]> (cherry picked from commit 1b67d54) Signed-off-by: herman <[email protected]>
| } | ||
| } | ||
|
|
||
| test("calendar interval agg support hash aggregate") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @cloud-fan .
Although this is a correct removal, this is a part of [SPARK-30047][SQL] Support interval types in UnsafeRow which is not aimed in the PR title and description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
support interval in UnsafeRow is fine. It has perf benefits, not just for hash aggregate, so we shouldn't revert it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm not asking to revert SPARK-30047. I'm saying that it would be great if this PR title and description mentioned this clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the PR description.
### What changes were proposed in this pull request? This PR reverts apache#26325 and apache#26347 ### Why are the changes needed? When we do sum/avg, we need a wider type of input to hold the sum value, to reduce the possibility of overflow. For example, we use long to hold the sum of integral inputs, use double to hold the sum of float/double. However, we don't have a wider type of interval. Also the semantic is unclear: what if the days field overflows but the months field doesn't? Currently the avg of `1 month` and `2 month` is `1 month 15 days`, which assumes 1 month has 30 days and we should avoid this assumption. ### Does this PR introduce any user-facing change? yes, remove 2 features added in 3.0 ### How was this patch tested? N/A Closes apache#27619 from cloud-fan/revert. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: herman <[email protected]>
What changes were proposed in this pull request?
This PR reverts #26325 and #26347
It also removes a test added by SPARK-30047. We can't use interval in aggregate now.
Why are the changes needed?
When we do sum/avg, we need a wider type of input to hold the sum value, to reduce the possibility of overflow. For example, we use long to hold the sum of integral inputs, use double to hold the sum of float/double.
However, we don't have a wider type of interval. Also the semantic is unclear: what if the days field overflows but the months field doesn't? Currently the avg of
1 monthand2 monthis1 month 15 days, which assumes 1 month has 30 days and we should avoid this assumption.Does this PR introduce any user-facing change?
yes, remove 2 features added in 3.0
How was this patch tested?
N/A