Column indexes

Write the column indexes described in PARQUET-922.
 This is the first phase of implementing the whole feature. The implementation is done in the following steps:
- Utility to read/write indexes in parquet-format
- Writing indexes in the parquet file
- Extend parquet-tools and parquet-cli to show the indexes
- Limit index size based on parquet properties
- Trim min/max values where possible based on parquet properties
- Filtering based on column indexes
  
  The work is done on the feature branch `column-indexes`. This JIRA will be resolved after the branch has been merged to `master`.

**Reporter**: [Gabor Szadovszky](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=gszadovszky) / @gszadovszky
**Assignee**: [Gabor Szadovszky](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=gszadovszky) / @gszadovszky
#### Subtasks:
- [X] [Column indexes: read/write API](https://github.com/apache/parquet-java/issues/2129)
- [X] [Column indexes: Show indexes in tools](https://github.com/apache/parquet-java/issues/2130)
- [X] [Column indexes: Limit index size](https://github.com/apache/parquet-java/issues/2131)
- [X] [Column indexes: Truncate min/max values](https://github.com/apache/parquet-java/issues/1508)
- [X] [Column indexes: Filtering](https://github.com/apache/parquet-java/issues/2178)
- [X] [Column Indexes: Invalid row indexes for pages starting with nulls](https://github.com/apache/parquet-java/issues/1527)
- [X] [Incorrect check for ASCENDING/DESCENDING at column index write path](https://github.com/apache/parquet-java/issues/2213)
- [X] [Fix issues of NaN and +-0.0 in case of float/double column indexes](https://github.com/apache/parquet-java/issues/2215)
- [X] [Improve value skipping at page synchronization](https://github.com/apache/parquet-java/issues/2217)
- [X] [appendRowGroup will loose pageIndex](https://github.com/apache/parquet-java/issues/2808)
#### Related issues:
- [Don't write page level statistics](https://github.com/apache/parquet-java/issues/2204) (blocks)
- [Write index page in parquet file](https://github.com/apache/parquet-java/issues/2127) (is duplicated by)
- [Limit page size based on maximum row count](https://github.com/apache/parquet-java/issues/2227) (relates to)
- [Make Spark SQL support Column indexes](https://github.com/apache/parquet-java/issues/2415) (relates to)
- [Add index pages to the format to support efficient page skipping](https://github.com/apache/parquet-format/issues/324) (depends upon)
- [Improve logic when to write column indexes](https://github.com/apache/parquet-java/issues/2228) (is depended upon by)
- [Benchmark filtering column-indexes](https://github.com/apache/parquet-java/issues/2235) (is depended upon by)
#### PRs and other links:
- [GitHub Pull Request #527](https://github.com/apache/parquet-mr/pull/527)
- [format PR](https://github.com/apache/parquet-format/pull/81)

<sub>**Note**: *This issue was originally created as [PARQUET-1201](https://issues.apache.org/jira/browse/PARQUET-1201). Please see the [migration documentation](https://issues.apache.org/jira/browse/PARQUET-2502) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Column indexes #2123

Subtasks:

Related issues:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Column indexes #2123

Description

Subtasks:

Related issues:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions