-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Describe the bug
When a parent operator requires a hash partitioning in EnforceDistribution rule, there exists a conition that theoretically should add round-robin partitioning. The conditions for this are:
If the parent operator requires hash partitioning
- If hash repartitioning is not necessary (we are not already correctly hashed) and we want to add a round-robin => add a round-robin repartition on top
- If hash is necessary (we do not meet the hash requirements) => add a hash repartition on top
This logic is incorrect as if the parent requires hash repartitioning and we are already hash partitioned, we should never round-robin repartition as this would break our hash partitioning.
This logic is proved to be incorrect as Datafusion actually never evaluates the first condition as true.
To Reproduce
Run all sqllogictests and benchmarks with this condition.
Delete it, and run them all again. No plans will change (it's dead code)
Expected behavior
The correct logic should be
If the parent operator requires hash partitioning
- If we are already hashed correctly, don't do anything. If we insert a round-robin we will break out partitioning
- If we a hash is required, just insert a hash repartition
Additional context
I actually created this dead code in issue #18341 to avoid consecutive repartitions, but glanced over the fact that this round-robin should never be happening (and the change makes it so that it doesn't) in the first place