Skip to content

Limit page size based on maximum row count #2227

@asfimport

Description

@asfimport

For column index based filtering it is important to have enough pages for a column. In case of a perfectly matching encoding for the suitable data it can happen that all of the values can be encoded in one page (e.g. a column of an ascending counter).

With this improvement we would be able to limit the pages by the maximum number of rows to be written in it so we would have enough pages for every column.

Based on the benchmarks listed here 20k seems to be a good choice for the default value.

Reporter: Gabor Szadovszky / @gszadovszky
Assignee: Gabor Szadovszky / @gszadovszky

Related issues:

PRs and other links:

Note: This issue was originally created as PARQUET-1414. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions