You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR speeds up `read.stream` and `read` by skipping `fs.stat` call if
`size` was passed via `opts`. Currently, the only reason for doing a
`stat` call is to get the size (and throw the size mismatch error if the
size is different). This is unnecessary for 3 reasons:
1. In the case of `read.stream`, the stream already compares the sizes
at the end and throws an error if there's a mismatch.
2. In the case of `read`, we can compare the sizes after reading the
cache contents
3. In both cases we are already doing an integrity check which would
automatically fail if there's a size difference since the hashes would
be different.
In this PR, the `stat` call is only made if the user does not pass a
`size` property via `opts`. This makes sense because without knowing the
`size`, the stream has to make an unnecessary `fs.read` call at the end
before closing which has a significant cost (that cost is much, much
greater than the cost of doing `fs.stat`).
On my machine, the benchmarks with this change look like this:
```
┌─────────┬─────────────────────┬─────────┬────────────────────┬───────────┬─────────┐
│ (index) │ Task Name │ ops/sec │ Average Time (ns) │ Margin │ Samples │
├─────────┼─────────────────────┼─────────┼────────────────────┼───────────┼─────────┤
│ 0 │ 'read.stream (new)' │ '4,643' │ 215352.03841424757 │ '±10.20%' │ 465 │
│ 1 │ 'read.stream (old)' │ '3,933' │ 254237.5665025663 │ '±7.17%' │ 394 │
│ 2 │ 'read (old)' │ '2,915' │ 343045.55719845917 │ '±13.42%' │ 292 │
│ 3 │ 'read (new)' │ '4,392' │ 227636.30011033904 │ '±12.14%' │ 449 │
└─────────┴─────────────────────┴─────────┴────────────────────┴───────────┴─────────┘
```
That's a solid 16% improvement in the case of `read.stream` and 36%
improvement in the case of `read`.
0 commit comments