Skip to content

Delete Unneecessary Round-Robin Condition in Repartitioning #19131

@gene-bordegaray

Description

@gene-bordegaray

Describe the bug

When a parent operator requires a hash partitioning in EnforceDistribution rule, there exists a conition that theoretically should add round-robin partitioning. The conditions for this are:

If the parent operator requires hash partitioning

  • If hash repartitioning is not necessary (we are not already correctly hashed) and we want to add a round-robin => add a round-robin repartition on top
  • If hash is necessary (we do not meet the hash requirements) => add a hash repartition on top

link to code

This logic is incorrect as if the parent requires hash repartitioning and we are already hash partitioned, we should never round-robin repartition as this would break our hash partitioning.

This logic is proved to be incorrect as Datafusion actually never evaluates the first condition as true.

To Reproduce

Run all sqllogictests and benchmarks with this condition.

Delete it, and run them all again. No plans will change (it's dead code)

Expected behavior

The correct logic should be

If the parent operator requires hash partitioning

  • If we are already hashed correctly, don't do anything. If we insert a round-robin we will break out partitioning
  • If we a hash is required, just insert a hash repartition

Additional context

I actually created this dead code in issue #18341 to avoid consecutive repartitions, but glanced over the fact that this round-robin should never be happening (and the change makes it so that it doesn't) in the first place

Metadata

Metadata

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions