-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39835][SQL] Fix EliminateSorts remove global sort below the local sort #37250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| private def recursiveRemoveSort(plan: LogicalPlan): LogicalPlan = { | ||
| /** | ||
| * If the upper sort is global then we can remove the global or local sort recursively. | ||
| * If the upper sort is local then we can only remove the local sort recursively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the upper Sort is local, why we can't remove the global Sort recursively ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can eliminate it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think about what if we remove all the Repartition nodes, will users complain? They will even if Repartition does not change the data, but only change the partitioning. Data partitioning is also a user expectation that we shouldn't break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics of global + local sort should be range partition + local sort, so we can not remove the global sort which is under local sort as we can not remove range partition directly. BTW, I will add a new rule to optimzie this pattern after fix this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I also agree we should not remove the global Sort, as user might have expectation of overall range partitioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it.
| } | ||
| plan match { | ||
| case Sort(_, _, child) => recursiveRemoveSort(child) | ||
| case Sort(_, _, child) if canRemoveGlobalSort => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
case Sort(_, global, child) if canRemoveGlobalSort || !global =>
| } | ||
|
|
||
| test("SPARK-39835: Fix EliminateSorts remove global sort below the local sort") { | ||
| // local - global |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit hard to read as I don't know which one is child. How about global -> local?
|
cc @sigmod |
|
@ulysses-you do you know which branch we start to have this bug? |
|
@cloud-fan branch-3.0 spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala Lines 1015 to 1023 in ea05c33
I guess there are a lot of conflicts for each branch .. |
|
The GA failure is unrelated, thanks, merging to master/3.3! |
…cal sort ### What changes were proposed in this pull request? Correct the `EliminateSorts` follows: - If the upper sort is global then we can remove the global or local sort recursively. - If the upper sort is local then we can only remove the local sort recursively. ### Why are the changes needed? If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected. This issue is going to worse since we pull out the V1 Write sort to logcial side. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? add test Closes #37250 from ulysses-you/remove-sort. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 5dca26d) Signed-off-by: Wenchen Fan <[email protected]>
|
@ulysses-you can you open backport PRs for 3.2 and 3.1? thanks! |
|
@cloud-fan created #37276 and #37275 |
…cal sort Correct the `EliminateSorts` follows: - If the upper sort is global then we can remove the global or local sort recursively. - If the upper sort is local then we can only remove the local sort recursively. If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected. This issue is going to worse since we pull out the V1 Write sort to logcial side. yes, bug fix add test Closes apache#37250 from ulysses-you/remove-sort. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…cal sort Correct the `EliminateSorts` follows: - If the upper sort is global then we can remove the global or local sort recursively. - If the upper sort is local then we can only remove the local sort recursively. If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected. This issue is going to worse since we pull out the V1 Write sort to logcial side. yes, bug fix add test Closes apache#37250 from ulysses-you/remove-sort. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
this is for backport #37330 into branch-3.3 ### What changes were proposed in this pull request? Optimize Global sort to RepartitionByExpression, for example: ``` Sort local Sort local Sort global => RepartitionByExpression ``` ### Why are the changes needed? If a global sort below a local sort, the only meaningful thing is it's distribution. So this pr optimizes that global sort to RepartitionByExpression to save a local sort. ### Does this PR introduce _any_ user-facing change? we fix a bug in #37250 and that pr backport into branch-3.3. However, that fix may introduce performance regression. This pr itself is only to improve performance but in order to avoid the regression, we also backport this pr. see the details #37330 (comment) ### How was this patch tested? add test Closes #37330 from ulysses-you/optimize-sort. Authored-by: ulysses-you <ulyssesyou18gmail.com> Signed-off-by: Wenchen Fan <wenchendatabricks.com> Closes #37373 from ulysses-you/SPARK-39911-3.3. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…he local sort backport #37250 into branch-3.1 ### What changes were proposed in this pull request? Correct the `EliminateSorts` follows: - If the upper sort is global then we can remove the global or local sort recursively. - If the upper sort is local then we can only remove the local sort recursively. ### Why are the changes needed? If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected. This issue is going to worse since we pull out the V1 Write sort to logcial side. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? add test Closes #37276 from ulysses-you/SPARK-39835-3.1. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…he local sort backport #37250 into branch-3.2 ### What changes were proposed in this pull request? Correct the `EliminateSorts` follows: - If the upper sort is global then we can remove the global or local sort recursively. - If the upper sort is local then we can only remove the local sort recursively. ### Why are the changes needed? If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected. This issue is going to worse since we pull out the V1 Write sort to logcial side. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? add test Closes #37275 from ulysses-you/SPARK-39835-3.2. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…he local sort backport apache#37250 into branch-3.2 ### What changes were proposed in this pull request? Correct the `EliminateSorts` follows: - If the upper sort is global then we can remove the global or local sort recursively. - If the upper sort is local then we can only remove the local sort recursively. ### Why are the changes needed? If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected. This issue is going to worse since we pull out the V1 Write sort to logcial side. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? add test Closes apache#37275 from ulysses-you/SPARK-39835-3.2. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 265bd21)
What changes were proposed in this pull request?
Correct the
EliminateSortsfollows:Why are the changes needed?
If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected.
This issue is going to worse since we pull out the V1 Write sort to logcial side.
Does this PR introduce any user-facing change?
yes, bug fix
How was this patch tested?
add test