-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-47094][SQL][TEST][FOLLOWUP] SPJ : fix bucket reducer function #47126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@szehon-ho please take a look. |
szehon-ho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks , some preliminary comment.
As this is just fixing test transform, I think we should just add one minimum negative test for this (to assert no SPJ in this case)
sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
Outdated
Show resolved
Hide resolved
...ore/src/test/scala/org/apache/spark/sql/connector/catalog/functions/transformFunctions.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
Outdated
Show resolved
Hide resolved
So previously when it is reduced to 1, is it a correctness issue? Or just performance issue? |
sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
Outdated
Show resolved
Hide resolved
performance issue, if it reduces to 1, there will be only task doing the work. |
|
@viirya it seems it is a test transform, but good to have a good example |
Oh okay, I didn't see it is test only code. |
8493934 to
b85847c
Compare
|
@viirya please take another look, |
|
cc @huaxingao |
sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
Outdated
Show resolved
Hide resolved
remove newline
d24d1a0 to
e503341
Compare
|
Merged to master. Thanks @himadripal @szehon-ho @viirya |
|
Thank you all. |
What changes were proposed in this pull request?
SPJ compatible bucket issue has an implementation of reducible function. This patch fixes the implementation and make it same as in apache iceberg one.
Why are the changes needed?
With this fix, incompatible number of buckets do not return 1 as GCD, hence the buckets do not reduce to 1 when it used in incompatible number of buckets.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
With unit tests
Was this patch authored or co-authored using generative AI tooling?
No.