Skip to content

Conversation

@NicoElbers
Copy link
Contributor

@NicoElbers NicoElbers commented Nov 3, 2025

perf_diff Old (Base impl) vs this (New impl) performance based on a [micro-benchmark](https://zigbin.io/b8b396) compiled with zig `0.16.0-dev.1220+95c76b1b4` on ReleaseFast. The 'small' category is items of at most 7 bits, the 'medium' category is items of at most 32 bits and the 'full' category is arbitrary items of `@bitSizeOf(T)` bits.

Rewrite writeLeb128 to no longer use writeMultipleOf7Leb128,
instead:

  • Make use of byte aligned ints
  • Use writeableSliceGreedy instead of an array
  • Special case small numbers (<= 7 bits)

Rewrite Reader.takeLeb128 to not use takeMultipleOf7Leb128 and
instead:

  • Use byte aligned integers
  • Turn the decoding into a finite state machine
  • Turn the main reading loop into an inlined loop of static length
  • Special case small integers (<= 7 bits)

For writing across larger integers (> 7 bits) this roughly doubles performance in a micro-benchmark.

For reading across larger integers (> 7 bits) this roughly doubles performance in a micro-benchmark, however with some notable exceptions:

  • i50 sees barely any speedup and u50 even sees a minor slowdown. I
    have genuinely no clue what could be causing this, it's the exception
    to the rule.
  • 8, 16 and 32 bit integers see a significantly higher speedup

Rewrite `writeLeb128` to no longer use `writeMultipleOf7Leb128`,
instead:
 * Make use of byte aligned ints
 * Use `writeableSliceGreedy` instead of an array
 * Special case small numbers (fitting inside 7 bits)

Across larger integers (> 7 bits) this roughly doubles performance in a
[micro-benchmark](https://zigbin.io/b8b396).

Also add test coverage
Rewrite `Reader.takeLeb128` to not use `takeMultipleOf7Leb128` and
instead:
 * Use byte aligned integers
 * Turn the decoding into a finite state machine
 * Turn the main reading loop into an inlined loop of static length
 * Special case small integers (<= 7 bits)

Across larger integers (> 7 bits) this roughly doubles performance in a
[micro-benchmark](https://zigbin.io/b8b396), however with some notable
exceptions:
 * `i50` sees barely any speedup and `u50` even sees a minor slowdown. I
   have genuinely no clue what could be causing this, it's the exception
   to the rule.
 * 8, 16 and 32 bit integers see a significantly higher speedup

Also expand on test coverage
Fortunately no real performance regression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant