Optimize {write,take}Leb128 #25794

NicoElbers · 2025-11-03T11:45:45Z

Old (Base impl) vs this (New impl) performance based on a [micro-benchmark](https://zigbin.io/b8b396) compiled with zig `0.16.0-dev.1220+95c76b1b4` on ReleaseFast. The 'small' category is items of at most 7 bits, the 'medium' category is items of at most 32 bits and the 'full' category is arbitrary items of `@bitSizeOf(T)` bits.

Rewrite writeLeb128 to no longer use writeMultipleOf7Leb128,
instead:

Make use of byte aligned ints
Use writeableSliceGreedy instead of an array
Special case small numbers (<= 7 bits)

Rewrite Reader.takeLeb128 to not use takeMultipleOf7Leb128 and
instead:

Use byte aligned integers
Turn the decoding into a finite state machine
Turn the main reading loop into an inlined loop of static length
Special case small integers (<= 7 bits)

For writing across larger integers (> 7 bits) this roughly doubles performance in a micro-benchmark.

For reading across larger integers (> 7 bits) this roughly doubles performance in a micro-benchmark, however with some notable exceptions:

i50 sees barely any speedup and u50 even sees a minor slowdown. I
have genuinely no clue what could be causing this, it's the exception
to the rule.
8, 16 and 32 bit integers see a significantly higher speedup

Rewrite `writeLeb128` to no longer use `writeMultipleOf7Leb128`, instead: * Make use of byte aligned ints * Use `writeableSliceGreedy` instead of an array * Special case small numbers (fitting inside 7 bits) Across larger integers (> 7 bits) this roughly doubles performance in a [micro-benchmark](https://zigbin.io/b8b396). Also add test coverage

Rewrite `Reader.takeLeb128` to not use `takeMultipleOf7Leb128` and instead: * Use byte aligned integers * Turn the decoding into a finite state machine * Turn the main reading loop into an inlined loop of static length * Special case small integers (<= 7 bits) Across larger integers (> 7 bits) this roughly doubles performance in a [micro-benchmark](https://zigbin.io/b8b396), however with some notable exceptions: * `i50` sees barely any speedup and `u50` even sees a minor slowdown. I have genuinely no clue what could be causing this, it's the exception to the rule. * 8, 16 and 32 bit integers see a significantly higher speedup Also expand on test coverage

Fortunately no real performance regression

NicoElbers added 2 commits November 3, 2025 12:31

NicoElbers force-pushed the leb-perf branch from 39dc0cb to 21ff57a Compare November 3, 2025 11:46

Writer.writeLeb128: Work around ziglang#19730

8234af8

Fortunately no real performance regression

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize {write,take}Leb128 #25794

Optimize {write,take}Leb128 #25794

NicoElbers commented Nov 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Optimize {write,take}Leb128 #25794

Are you sure you want to change the base?

Optimize {write,take}Leb128 #25794

Conversation

NicoElbers commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NicoElbers commented Nov 3, 2025 •

edited

Loading