-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-45592][SQL] Correctness issue in AQE with InMemoryTableScanExec #43435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging @cloud-fan and @ulysses-you since they created PRs in this area and might not a better way of fixing the bug. |
|
I think the issue is that, we propagate a coalesced shuffle exchange through BTW, if you set There are two code place related to this issue:
|
|
|
|
@cloud-fan I took a stab at implementing your suggestion, but the reproduction of the bug still fails. So either I made some mistake or missed some other part of the code that needs to be updated. Would be great if you could provide some feedback. |
|
@ulysses-you can you take a look when you have time? |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
Outdated
Show resolved
Hide resolved
82456fb to
dbef196
Compare
ulysses-you
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me, cc @cloud-fan
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
Outdated
Show resolved
Hide resolved
60e64ae to
b676e75
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/DistributionSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ShuffleSpecSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEShuffleReadExec.scala
Outdated
Show resolved
Hide resolved
Is the suggestion to do that in this PR or is it better to do it in a follow up? |
|
@eejbyfeldt nvm, I made a mistake. This is for coalesce, we can add a new partitioning for skew join handling (split and replicate partitions). It's unrelated to this PR and we can do it latter. |
947234a to
4f6bd1d
Compare
| override val numPartitions: Int = partitions.length | ||
|
|
||
| override def toString: String = from.toString | ||
| override def sql: String = from.sql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a second thought, why do we need to hide CoalescedHashPartitioning? Can we run some example queries and check EXPLAIN and SQL web UI?
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with only one minor comment
|
@eejbyfeldt Can you briefly describe the triggering condition of this bug? Does it only occur when coalescing happens to produce just the exact number of partitions as the other side of the join? In the meantime, I'm wondering if it would be better to:
This PR, just to address the correctness issue, only needs to do 1. And we can do 2 (a little trickier I suppose) for performance improvement. |
|
Synced with @cloud-fan offline, (2) in the above suggestion wouldn't work. Let's go ahead with current fix. |
|
The failed streaming test is unrelated, and my last comment is quite minor, let's merge it first to fix the correctness bug. Thanks for you great work! |
Fixes correctness issue in 3.5.0. The problem seems to be that when AQEShuffleRead does a coalesced read it can return a HashPartitioning with the coalesced number of partitions. This causes a correctness bug as the partitioning is not compatible for joins with other HashPartitioning even though the number of partitions matches. This is resolved in this patch by introducing CoalescedHashPartitioning and making AQEShuffleRead return that instead. The fix was suggested by cloud-fan > AQEShuffleRead should probably return a different partitioning, e.g. CoalescedHashPartitioning. It still satisfies ClusterDistribution, so Aggregate is fine and there will be no shuffle. For joins, two CoalescedHashPartitionings are compatible if they have the same original partition number and coalesce boundaries, and CoalescedHashPartitioning is not compatible with HashPartitioning. Correctness bug. Yes, fixed correctness issue. New and existing unit test. No Closes #43435 from eejbyfeldt/SPARK-45592. Authored-by: Emil Ejbyfeldt <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 2be03d8) Signed-off-by: Wenchen Fan <[email protected]>
|
Thank you, @eejbyfeldt and all. |
Fixes correctness issue in 3.5.0. The problem seems to be that when AQEShuffleRead does a coalesced read it can return a HashPartitioning with the coalesced number of partitions. This causes a correctness bug as the partitioning is not compatible for joins with other HashPartitioning even though the number of partitions matches. This is resolved in this patch by introducing CoalescedHashPartitioning and making AQEShuffleRead return that instead. The fix was suggested by cloud-fan > AQEShuffleRead should probably return a different partitioning, e.g. CoalescedHashPartitioning. It still satisfies ClusterDistribution, so Aggregate is fine and there will be no shuffle. For joins, two CoalescedHashPartitionings are compatible if they have the same original partition number and coalesce boundaries, and CoalescedHashPartitioning is not compatible with HashPartitioning. Correctness bug. Yes, fixed correctness issue. New and existing unit test. No Closes apache#43435 from eejbyfeldt/SPARK-45592. Authored-by: Emil Ejbyfeldt <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 2be03d8) Signed-off-by: Wenchen Fan <[email protected]>
…MemoryTableScanExec ### What changes were proposed in this pull request? This backports #43435 SPARK-45592 to the 3.4 branch. This is because it was already reported there as SPARK-45282 but it required enabling some extra configuration to hit the bug. ### Why are the changes needed? Fix correctness issue. ### Does this PR introduce _any_ user-facing change? Yes, fixing correctness issue. ### How was this patch tested? New tests based on the reproduction example in SPARK-45282 ### Was this patch authored or co-authored using generative AI tooling? No Closes #43729 from eejbyfeldt/SPARK-45282. Authored-by: Emil Ejbyfeldt <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
|
Hello, is there a timeline for 3.5.1 release? We are facing the issue in 3.5.0 and would like to know when the next stable version will be rolled out. |
|
Hi, is there a tentative timeline for releasing spark-3.5.1 with these changes? |
### What changes were proposed in this pull request? #43435 and #43760 are fixing a correctness issue which will be triggered when AQE applied on cached query plan, specifically, when AQE coalescing the final result stage of the cached plan. The current semantic of `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` ([source code](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala#L403-L411)): when true, we enable AQE, but disable coalescing final stage (default) when false, we disable AQE But let’s revisit the semantic of this config: actually for caller the only thing that matters is whether we change the output partitioning of the cached plan. And we should only try to apply AQE if possible. Thus we want to modify the semantic of spark.sql.optimizer.canChangeCachedPlanOutputPartitioning when true, we enable AQE and allow coalescing final: this might lead to perf regression, because it introduce extra shuffle when false, we enable AQE, but disable coalescing final stage. (this is actually the `true` semantic of old behavior) Also, to keep the default behavior unchanged, we might want to flip the default value of spark.sql.optimizer.canChangeCachedPlanOutputPartitioning to `false` ### Why are the changes needed? To allow AQE coalesce final stage in SQL cached plan. Also make the semantic of `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` more reasonable. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Updated UTs. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45054 from liuzqt/SPARK-46995. Authored-by: Ziqi Liu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…MemoryTableScanExec ### What changes were proposed in this pull request? This backports apache#43435 SPARK-45592 to the 3.4 branch. This is because it was already reported there as SPARK-45282 but it required enabling some extra configuration to hit the bug. ### Why are the changes needed? Fix correctness issue. ### Does this PR introduce _any_ user-facing change? Yes, fixing correctness issue. ### How was this patch tested? New tests based on the reproduction example in SPARK-45282 ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#43729 from eejbyfeldt/SPARK-45282. Authored-by: Emil Ejbyfeldt <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
Fixes correctness issue in 3.5.0. The problem seems to be that when AQEShuffleRead does a coalesced read it can return a HashPartitioning with the coalesced number of partitions. This causes a correctness bug as the partitioning is not compatible for joins with other HashPartitioning even though the number of partitions matches. This is resolved in this patch by introducing CoalescedHashPartitioning and making AQEShuffleRead return that instead. The fix was suggested by cloud-fan > AQEShuffleRead should probably return a different partitioning, e.g. CoalescedHashPartitioning. It still satisfies ClusterDistribution, so Aggregate is fine and there will be no shuffle. For joins, two CoalescedHashPartitionings are compatible if they have the same original partition number and coalesce boundaries, and CoalescedHashPartitioning is not compatible with HashPartitioning. Correctness bug. Yes, fixed correctness issue. New and existing unit test. No Closes apache#43435 from eejbyfeldt/SPARK-45592. Authored-by: Emil Ejbyfeldt <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 2be03d8) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Fixes correctness issue in 3.5.0. The problem seems to be that when AQEShuffleRead does a coalesced read it can return a HashPartitioning with the coalesced number of partitions. This causes a correctness bug as the partitioning is not compatible for joins with other HashPartitioning even though the number of partitions matches. This is resolved in this patch by introducing CoalescedHashPartitioning and making AQEShuffleRead return that instead.
The fix was suggested by @cloud-fan
Why are the changes needed?
Correctness bug.
Does this PR introduce any user-facing change?
Yes, fixed correctness issue.
How was this patch tested?
New and existing unit test.
Was this patch authored or co-authored using generative AI tooling?
No