[SPARK-28356][SHUFFLE][FOLLOWUP] Fix case with different pre-shuffle partition numbers #25479

peter-toth · 2019-08-16T14:56:12Z

What changes were proposed in this pull request?

This PR reverts some of the latest changes in ReduceNumShufflePartitions to fix the case when there are different pre-shuffle partition numbers in the plan. Please see the new UT for an example.

Why are the changes needed?

Eliminate a bug.

Does this PR introduce any user-facing change?

Yes, some queries that failed will succeed now.

How was this patch tested?

Added new UT.

…artition numbers

peter-toth · 2019-08-16T15:00:57Z

I opened this PR to fix #25121 (comment)

cc @cloud-fan @carsonwang @maryannxue

cloud-fan · 2019-08-16T15:39:36Z

ok to test

cloud-fan · 2019-08-16T15:40:23Z

...core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala

      // we should skip it when calculating the `partitionStartIndices`.
      val validMetrics = shuffleMetrics.filter(_ != null)
-      if (validMetrics.nonEmpty) {
+      // We may have different pre-shuffle partition numbers, don't reduce shuffle partition number


let's also give an example about when we will have different pre-shuffle partition numbers.

Ok, added. Please let me know if it should be more detailed.

cloud-fan · 2019-08-16T15:41:24Z

sql/core/src/test/scala/org/apache/spark/sql/execution/ReduceNumShufflePartitionsSuite.scala

+
+      val resultDf = df1.union(df2)
+
+      checkAnswer(resultDf, Seq((0), (1), (2), (3)).map(i => Row(i)))


does this fail without the fix?

It does. The plan is:

AdaptiveSparkPlan(isFinalPlan=false) +- Union :- Project [id#0L] : +- SortMergeJoin [id#0L], [id#2L], Inner : :- Sort [id#0L ASC NULLS FIRST], false, 0 : : +- Exchange hashpartitioning(id#0L, 5), true : : +- Range (0, 3, step=1, splits=12) : +- Sort [id#2L ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(id#2L, 5), true : +- Range (0, 3, step=1, splits=12) +- HashAggregate(keys=[], functions=[sum(id#6L)], output=[sum(id)#10L]) +- Exchange SinglePartition, true +- HashAggregate(keys=[], functions=[partial_sum(id#6L)], output=[sum#14L]) +- Range (0, 3, step=1, splits=12)

and the error comes from this assert: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala#L136

Can you fill the Does this PR introduce any user-facing change section? Changing a query from failure to runnable is a user-facing change.

Oh ok, sure, filled.

SparkQA · 2019-08-16T19:43:26Z

Test build #109215 has finished for PR 25479 at commit 6898f88.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2019-08-16T20:41:55Z

...core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala

+      // partition) and a result of a SortMergeJoin (multiple partitions).
+      val distinctNumPreShufflePartitions =
+        validMetrics.map(stats => stats.bytesByPartitionId.length).distinct
+      if (validMetrics.nonEmpty && distinctNumPreShufflePartitions.length == 1) {


After we have this condition distinctNumPreShufflePartitions.length == 1, do we still need the assert at L136? Shall we remove the assert?

Yes, we could remove it, but the assert has been there since the original version of ReduceNumShufflePartitions where the distinctNumPreShufflePartitions.length == 1 check was also included. I'm not sure what is the plan with ReduceNumShufflePartitions. @carsonwang, @maryannxue do you want to improve Union/SinglePartition handling in this rule? Shall we remove the assert?

I think it is fine to remove it. We can improve the handling of Union/SinglePartition in future and it probably needs more changes and a new function to estimate the partition start indices.

SparkQA · 2019-08-16T21:36:13Z

Test build #109233 has finished for PR 25479 at commit 31436a8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-08-19T07:53:55Z

thanks, merging to master!

…partition numbers ### What changes were proposed in this pull request? This PR reverts some of the latest changes in `ReduceNumShufflePartitions` to fix the case when there are different pre-shuffle partition numbers in the plan. Please see the new UT for an example. ### Why are the changes needed? Eliminate a bug. ### Does this PR introduce any user-facing change? Yes, some queries that failed will succeed now. ### How was this patch tested? Added new UT. Closes apache#25479 from peter-toth/SPARK-28356-followup. Authored-by: Peter Toth <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

[SPARK-28356][FOLLOWUP] fix case when we have different pre-shuffle p…

6898f88

…artition numbers

peter-toth mentioned this pull request Aug 16, 2019

[SPARK-28356][SQL] Do not reduce the number of partitions for repartition in adaptive execution #25121

Closed

cloud-fan reviewed Aug 16, 2019

View reviewed changes

peter-toth changed the title ~~[SPARK-28356][FOLLOWUP] fix case with different pre-shuffle partition numbers~~ [SPARK-28356][FOLLOWUP] Fix case with different pre-shuffle partition numbers Aug 16, 2019

dongjoon-hyun changed the title ~~[SPARK-28356][FOLLOWUP] Fix case with different pre-shuffle partition numbers~~ [SPARK-28356][SHUFFLE][FOLLOWUP] Fix case with different pre-shuffle partition numbers Aug 16, 2019

dongjoon-hyun added the SHUFFLE label Aug 16, 2019

add a example

31436a8

viirya reviewed Aug 16, 2019

View reviewed changes

cloud-fan closed this in f999e00 Aug 19, 2019


		val resultDf = df1.union(df2)

		checkAnswer(resultDf, Seq((0), (1), (2), (3)).map(i => Row(i)))

[SPARK-28356][SHUFFLE][FOLLOWUP] Fix case with different pre-shuffle partition numbers #25479

[SPARK-28356][SHUFFLE][FOLLOWUP] Fix case with different pre-shuffle partition numbers #25479

Uh oh!

Conversation

peter-toth commented Aug 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

peter-toth commented Aug 16, 2019

Uh oh!

cloud-fan commented Aug 16, 2019

Uh oh!

cloud-fan Aug 16, 2019

Choose a reason for hiding this comment

Uh oh!

peter-toth Aug 16, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan Aug 16, 2019

Choose a reason for hiding this comment

Uh oh!

peter-toth Aug 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Aug 19, 2019

Choose a reason for hiding this comment

Uh oh!

peter-toth Aug 19, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 16, 2019

Uh oh!

viirya Aug 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peter-toth Aug 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carsonwang Aug 19, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 16, 2019

Uh oh!

cloud-fan commented Aug 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

peter-toth commented Aug 16, 2019 •

edited

Loading

peter-toth Aug 16, 2019 •

edited

Loading

viirya Aug 16, 2019 •

edited

Loading

peter-toth Aug 17, 2019 •

edited

Loading