Skip to content

Conversation

@ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

Correct the EliminateSorts follows:

  • If the upper sort is global then we can remove the global or local sort recursively.
  • If the upper sort is local then we can only remove the local sort recursively.

Why are the changes needed?

If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected.

This issue is going to worse since we pull out the V1 Write sort to logcial side.

Does this PR introduce any user-facing change?

yes, bug fix

How was this patch tested?

add test

@ulysses-you
Copy link
Contributor Author

cc @cloud-fan @allisonwang-db

@github-actions github-actions bot added the SQL label Jul 22, 2022
private def recursiveRemoveSort(plan: LogicalPlan): LogicalPlan = {
/**
* If the upper sort is global then we can remove the global or local sort recursively.
* If the upper sort is local then we can only remove the local sort recursively.
Copy link
Contributor

@beliefer beliefer Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the upper Sort is local, why we can't remove the global Sort recursively ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can eliminate it too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think about what if we remove all the Repartition nodes, will users complain? They will even if Repartition does not change the data, but only change the partitioning. Data partitioning is also a user expectation that we shouldn't break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics of global + local sort should be range partition + local sort, so we can not remove the global sort which is under local sort as we can not remove range partition directly. BTW, I will add a new rule to optimzie this pattern after fix this pr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I also agree we should not remove the global Sort, as user might have expectation of overall range partitioning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it.

}
plan match {
case Sort(_, _, child) => recursiveRemoveSort(child)
case Sort(_, _, child) if canRemoveGlobalSort =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

case Sort(_, global, child) if canRemoveGlobalSort || !global =>

}

test("SPARK-39835: Fix EliminateSorts remove global sort below the local sort") {
// local - global
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit hard to read as I don't know which one is child. How about global -> local?

@cloud-fan
Copy link
Contributor

cc @sigmod

@cloud-fan
Copy link
Contributor

@ulysses-you do you know which branch we start to have this bug?

@ulysses-you
Copy link
Contributor Author

@cloud-fan branch-3.0

case s @ Sort(_, _, child) => s.copy(child = recursiveRemoveSort(child))
case j @ Join(originLeft, originRight, _, cond, _) if cond.forall(_.deterministic) =>
j.copy(left = recursiveRemoveSort(originLeft), right = recursiveRemoveSort(originRight))
case g @ Aggregate(_, aggs, originChild) if isOrderIrrelevantAggs(aggs) =>
g.copy(child = recursiveRemoveSort(originChild))
}
private def recursiveRemoveSort(plan: LogicalPlan): LogicalPlan = plan match {
case Sort(_, _, child) => recursiveRemoveSort(child)

I guess there are a lot of conflicts for each branch ..

@cloud-fan
Copy link
Contributor

The GA failure is unrelated, thanks, merging to master/3.3!

cloud-fan pushed a commit that referenced this pull request Jul 25, 2022
…cal sort

### What changes were proposed in this pull request?

 Correct the `EliminateSorts` follows:

- If the upper sort is global then we can remove the global or local sort recursively.
- If the upper sort is local then we can only remove the local sort recursively.

### Why are the changes needed?

If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected.

This issue is going to worse since we pull out the V1 Write sort to logcial side.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

add test

Closes #37250 from ulysses-you/remove-sort.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 5dca26d)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan cloud-fan closed this in 5dca26d Jul 25, 2022
@cloud-fan
Copy link
Contributor

@ulysses-you can you open backport PRs for 3.2 and 3.1? thanks!

@ulysses-you
Copy link
Contributor Author

@cloud-fan created #37276 and #37275

ulysses-you added a commit to ulysses-you/spark that referenced this pull request Jul 27, 2022
…cal sort

 Correct the `EliminateSorts` follows:

- If the upper sort is global then we can remove the global or local sort recursively.
- If the upper sort is local then we can only remove the local sort recursively.

If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected.

This issue is going to worse since we pull out the V1 Write sort to logcial side.

yes, bug fix

add test

Closes apache#37250 from ulysses-you/remove-sort.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
ulysses-you added a commit to ulysses-you/spark that referenced this pull request Jul 27, 2022
…cal sort

 Correct the `EliminateSorts` follows:

- If the upper sort is global then we can remove the global or local sort recursively.
- If the upper sort is local then we can only remove the local sort recursively.

If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected.

This issue is going to worse since we pull out the V1 Write sort to logcial side.

yes, bug fix

add test

Closes apache#37250 from ulysses-you/remove-sort.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Aug 3, 2022
this is for backport #37330 into branch-3.3
### What changes were proposed in this pull request?

Optimize Global sort to RepartitionByExpression, for example:
```
Sort local             Sort local
  Sort global    =>      RepartitionByExpression
```

### Why are the changes needed?

If a global sort below a local sort, the only meaningful thing is it's distribution. So this pr optimizes that global sort to RepartitionByExpression to save a local sort.

### Does this PR introduce _any_ user-facing change?

we fix a bug in #37250 and that pr backport into branch-3.3. However, that fix may introduce performance regression. This pr itself is only to improve performance but in order to avoid the regression, we also backport this pr. see the details #37330 (comment)

### How was this patch tested?

add test

Closes #37330 from ulysses-you/optimize-sort.

Authored-by: ulysses-you <ulyssesyou18gmail.com>
Signed-off-by: Wenchen Fan <wenchendatabricks.com>

Closes #37373 from ulysses-you/SPARK-39911-3.3.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Aug 3, 2022
…he local sort

backport #37250 into branch-3.1

### What changes were proposed in this pull request?

 Correct the `EliminateSorts` follows:

- If the upper sort is global then we can remove the global or local sort recursively.
- If the upper sort is local then we can only remove the local sort recursively.

### Why are the changes needed?

If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected.

This issue is going to worse since we pull out the V1 Write sort to logcial side.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

add test

Closes #37276 from ulysses-you/SPARK-39835-3.1.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Aug 3, 2022
…he local sort

backport #37250 into branch-3.2

### What changes were proposed in this pull request?

 Correct the `EliminateSorts` follows:

- If the upper sort is global then we can remove the global or local sort recursively.
- If the upper sort is local then we can only remove the local sort recursively.

### Why are the changes needed?

If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected.

This issue is going to worse since we pull out the V1 Write sort to logcial side.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

add test

Closes #37275 from ulysses-you/SPARK-39835-3.2.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…he local sort

backport apache#37250 into branch-3.2

### What changes were proposed in this pull request?

 Correct the `EliminateSorts` follows:

- If the upper sort is global then we can remove the global or local sort recursively.
- If the upper sort is local then we can only remove the local sort recursively.

### Why are the changes needed?

If a global sort below locol sort, we should not remove the global sort becuase the output partitioning can be affected.

This issue is going to worse since we pull out the V1 Write sort to logcial side.

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

add test

Closes apache#37275 from ulysses-you/SPARK-39835-3.2.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 265bd21)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants