Commit d183ee1
authored
Improve performance of
This improves performance of `ncodeunits(::Char)` by simply counting the
number of non-zero bytes (except for `\0`, which is encoded as all zero
bytes). For a performance comparison, see [this gist](
https://gist.github.com/Seelengrab/ebb02d4b8d754700c2869de8daf88cad);
there's an up to 10x improvement here for collections of `Char`, with a
minor improvement for single `Char` (with much smaller spread). The
version in this PR is called `nbytesencoded` in the benchmarks.
Correctness has been verified with Supposition.jl, using the existing
implementation as an oracle:
```julia
julia> using Supposition
julia> const chars = Data.Characters()
julia> @check max_examples=1_000_000 function bytesenc(i=Data.Integers{UInt32}())
c = reinterpret(Char, i)
ncodeunits(c) == nbytesdiv(c)
end;
Test Summary: | Pass Total Time
bytesenc | 1 1 1.0s
julia> ncodeunits('\0') == nbytesencoded('\0')
true
```
Let's see if CI agrees!
Notably, neither the existing nor the new implementation check whether
the given `Char` is valid or not, since the only thing that matters is
how many bytes are written out.
---------
Co-authored-by: Sukera <[email protected]>ncodeunits(::Char) (#54001)1 parent f870ea0 commit d183ee1
1 file changed
+8
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
66 | 73 | | |
67 | 74 | | |
68 | 75 | | |
| |||
0 commit comments