[Feature] Support generating splits with finer granularity than file level

### Motivation

If we have a single Parquet file that is 4GB in size but contains 10 row groups, we can generate 10 splits. This approach will significantly enhance concurrency.

#### A rough implementation path

- Expand the `DataSplit` interface to include support for file-level start offset and length.
- Then, implement the ability to split files by offset and length to generate corresponding data splits (for instance, Parquet files can be split further by row groups).
- Finally, ensure that the corresponding readers, such as Parquet, ORC, and Avro readers, support reading rows based on the specified start offset and length + file path.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Support generating splits with finer granularity than file level #5012

Motivation

A rough implementation path

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Support generating splits with finer granularity than file level #5012

Description

Motivation

A rough implementation path

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions