Skip to content

Conversation

@himadripal
Copy link

@himadripal himadripal commented Jun 27, 2024

What changes were proposed in this pull request?

SPJ compatible bucket issue has an implementation of reducible function. This patch fixes the implementation and make it same as in apache iceberg one.

Why are the changes needed?

With this fix, incompatible number of buckets do not return 1 as GCD, hence the buckets do not reduce to 1 when it used in incompatible number of buckets.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

With unit tests

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jun 27, 2024
@himadripal
Copy link
Author

@szehon-ho please take a look.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks , some preliminary comment.

As this is just fixing test transform, I think we should just add one minimum negative test for this (to assert no SPJ in this case)

@HyukjinKwon HyukjinKwon changed the title [SPARK-47094] SPJ : fix bucket reducer function [SPARK-47094][SQL] SPJ : fix bucket reducer function Jun 28, 2024
@viirya
Copy link
Member

viirya commented Oct 19, 2024

With this fix, incompatible number of buckets do not return 1 as GCD, hence the buckets do not reduce to 1 when it used in incompatible number of buckets.

So previously when it is reduced to 1, is it a correctness issue? Or just performance issue?

@himadripal
Copy link
Author

With this fix, incompatible number of buckets do not return 1 as GCD, hence the buckets do not reduce to 1 when it used in incompatible number of buckets.

So previously when it is reduced to 1, is it a correctness issue? Or just performance issue?

performance issue, if it reduces to 1, there will be only task doing the work.

@szehon-ho
Copy link
Member

@viirya it seems it is a test transform, but good to have a good example

@viirya
Copy link
Member

viirya commented Oct 19, 2024

@viirya it seems it is a test transform, but good to have a good example

Oh okay, I didn't see it is test only code.

@himadripal
Copy link
Author

@viirya please take another look,

@viirya viirya changed the title [SPARK-47094][SQL] SPJ : fix bucket reducer function [SPARK-47094][SQL][TEST][FOLLOWUP] SPJ : fix bucket reducer function Oct 29, 2024
@viirya
Copy link
Member

viirya commented Oct 29, 2024

cc @huaxingao

@himadripal himadripal force-pushed the fix_spj_transform_expression branch from d24d1a0 to e503341 Compare October 30, 2024 05:06
@huaxingao huaxingao closed this in 73a2b84 Oct 30, 2024
@huaxingao
Copy link
Contributor

Merged to master. Thanks @himadripal @szehon-ho @viirya

@dongjoon-hyun
Copy link
Member

Thank you all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants