Skip to content

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Aug 29, 2025

Which issue does this PR close?

Rationale for this change

Backport changes to allow apples-to-apples comparison of thrift decoding

What changes are included in this PR?

Adds a page header benchmark and updates bench names to match those in feature branch.

Are these changes tested?

No tests needed...only changes to benchmark

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Aug 29, 2025
@etseidl
Copy link
Contributor Author

etseidl commented Aug 29, 2025

With these changes, I can again use critcmp

% critcmp main read_page_header
group                                 main                                   read_page_header
-----                                 ----                                   ----------------
decode parquet metadata               1.87     35.8±0.98µs        ? ?/sec    1.00     19.1±0.33µs        ? ?/sec
decode parquet metadata (wide)        2.73    221.4±2.90ms        ? ?/sec    1.00     81.0±2.87ms        ? ?/sec
decode thrift file metadata           2.22     21.8±0.31µs        ? ?/sec    1.00      9.8±0.20µs        ? ?/sec
decode thrift file metadata (wide)    1.80    112.7±2.39ms        ? ?/sec    1.00     62.8±1.27ms        ? ?/sec
open(default)                         1.79     36.2±1.00µs        ? ?/sec    1.00     20.2±0.49µs        ? ?/sec
open(page index)                      6.80  1773.7±23.97µs        ? ?/sec    1.00    260.7±4.39µs        ? ?/sec
page headers                          1.67     12.1±0.27µs        ? ?/sec    1.00      7.3±0.17µs        ? ?/sec
page headers (no stats)                                                      1.00      4.2±0.06µs        ? ?/sec

dictionary_page_offset: Some(rng.random()),
statistics: Some(stats.clone()),
encoding_stats: None,
encoding_stats: Some(vec![
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some encoding stats because profiling showed around 25% of the time to read the file metadata struct is spent reading this vector 😮

@mbrobbel mbrobbel merged commit 471f3b1 into apache:main Sep 5, 2025
16 checks passed
@etseidl
Copy link
Contributor Author

etseidl commented Sep 5, 2025

Thanks @mbrobbel!

@etseidl etseidl deleted the update_metadata_bench branch October 10, 2025 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants