Skip to content

Bug: incorrectly ignores OFFSET clause (Spark 3.4+) #1739

@yew1eb

Description

@yew1eb

Since Apache Spark 3.4 (SPARK-28330), the physical plan fully represents OFFSET:

  • GlobalLimitExec and TakeOrderedAndProjectExec carries an explicit offset field
  • When both LIMIT and OFFSET are present, Spark stores limit + offset as the raw limit value

The current auron completely ignores the offset field.
As a result, the query:

SELECT * FROM t LIMIT 10 OFFSET 5

return the first 15 rows instead of rows 6–15, producing incorrect results.

We should fully support the LIMIT … OFFSET … semantics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions