Skip to content

Conversation

@peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Aug 16, 2019

What changes were proposed in this pull request?

This PR reverts some of the latest changes in ReduceNumShufflePartitions to fix the case when there are different pre-shuffle partition numbers in the plan. Please see the new UT for an example.

Why are the changes needed?

Eliminate a bug.

Does this PR introduce any user-facing change?

Yes, some queries that failed will succeed now.

How was this patch tested?

Added new UT.

@peter-toth
Copy link
Contributor Author

I opened this PR to fix #25121 (comment)

cc @cloud-fan @carsonwang @maryannxue

@cloud-fan
Copy link
Contributor

ok to test

// we should skip it when calculating the `partitionStartIndices`.
val validMetrics = shuffleMetrics.filter(_ != null)
if (validMetrics.nonEmpty) {
// We may have different pre-shuffle partition numbers, don't reduce shuffle partition number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also give an example about when we will have different pre-shuffle partition numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added. Please let me know if it should be more detailed.


val resultDf = df1.union(df2)

checkAnswer(resultDf, Seq((0), (1), (2), (3)).map(i => Row(i)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this fail without the fix?

Copy link
Contributor Author

@peter-toth peter-toth Aug 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. The plan is:

AdaptiveSparkPlan(isFinalPlan=false)
+- Union
   :- Project [id#0L]
   :  +- SortMergeJoin [id#0L], [id#2L], Inner
   :     :- Sort [id#0L ASC NULLS FIRST], false, 0
   :     :  +- Exchange hashpartitioning(id#0L, 5), true
   :     :     +- Range (0, 3, step=1, splits=12)
   :     +- Sort [id#2L ASC NULLS FIRST], false, 0
   :        +- Exchange hashpartitioning(id#2L, 5), true
   :           +- Range (0, 3, step=1, splits=12)
   +- HashAggregate(keys=[], functions=[sum(id#6L)], output=[sum(id)#10L])
      +- Exchange SinglePartition, true
         +- HashAggregate(keys=[], functions=[partial_sum(id#6L)], output=[sum#14L])
            +- Range (0, 3, step=1, splits=12)

and the error comes from this assert: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala#L136

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fill the Does this PR introduce any user-facing change section? Changing a query from failure to runnable is a user-facing change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, sure, filled.

@peter-toth peter-toth changed the title [SPARK-28356][FOLLOWUP] fix case with different pre-shuffle partition numbers [SPARK-28356][FOLLOWUP] Fix case with different pre-shuffle partition numbers Aug 16, 2019
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28356][FOLLOWUP] Fix case with different pre-shuffle partition numbers [SPARK-28356][SHUFFLE][FOLLOWUP] Fix case with different pre-shuffle partition numbers Aug 16, 2019
@SparkQA
Copy link

SparkQA commented Aug 16, 2019

Test build #109215 has finished for PR 25479 at commit 6898f88.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// partition) and a result of a SortMergeJoin (multiple partitions).
val distinctNumPreShufflePartitions =
validMetrics.map(stats => stats.bytesByPartitionId.length).distinct
if (validMetrics.nonEmpty && distinctNumPreShufflePartitions.length == 1) {
Copy link
Member

@viirya viirya Aug 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we have this condition distinctNumPreShufflePartitions.length == 1, do we still need the assert at L136? Shall we remove the assert?

Copy link
Contributor Author

@peter-toth peter-toth Aug 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could remove it, but the assert has been there since the original version of ReduceNumShufflePartitions where the distinctNumPreShufflePartitions.length == 1 check was also included. I'm not sure what is the plan with ReduceNumShufflePartitions. @carsonwang, @maryannxue do you want to improve Union/SinglePartition handling in this rule? Shall we remove the assert?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to remove it. We can improve the handling of Union/SinglePartition in future and it probably needs more changes and a new function to estimate the partition start indices.

@SparkQA
Copy link

SparkQA commented Aug 16, 2019

Test build #109233 has finished for PR 25479 at commit 31436a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in f999e00 Aug 19, 2019
j-baker pushed a commit to palantir/spark that referenced this pull request Jan 25, 2020
…partition numbers

### What changes were proposed in this pull request?

This PR reverts some of the latest changes in `ReduceNumShufflePartitions` to fix the case when there are different pre-shuffle partition numbers in the plan. Please see the new UT for an example.

### Why are the changes needed?
Eliminate a bug.

### Does this PR introduce any user-facing change?
Yes, some queries that failed will succeed now.

### How was this patch tested?
Added new UT.

Closes apache#25479 from peter-toth/SPARK-28356-followup.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants