-
Notifications
You must be signed in to change notification settings - Fork 316
Description
First off, thank you for wasmi and congrats on the recent 0.32 release! I recently swapped out wasmtime for wasmi 0.32-beta.16 in a certain project (not yet public) and was quite happy with it. The build got faster and smaller, the project got more portable, and wasmi's balance of startup latency + wasm execution speed worked much better for that project than wasmtime's (even with winch, which was already an improvement over cranelift). Many tests in that project compile a medium-sized wasm module and run it for short but nontrivial amount of time, and switching to wasmi v0.32.0-beta.16 made those tests run faster.
Unfortunately and surprisingly, when I tried to update to v0.32.0-beta.18 and later to v0.32.0, I found that it got 5x to 6x slower in the configuration I care about the most: building my project in the dev/test profile but enabling optimizations for wasmi and wasmi_core (via profile overrides). I've managed to minimize it down to a 1 KiB wasm module and a fairly trivial embedding: wasmi-slow-repro.tar.gz. In that tarball:
- The compiled wasm module is included for completeness but ought to be reproducible
- The two
host-*
crates do the same thing with different wasmi versions: instantiate the guest and run its sole export compare.sh
builds everything and runs it through hyperfine
I would expect the performance to be the same for both wasmi versions, but in the dev profile (as exercised by the script) it differs:
Benchmark 1: host-beta16/target/debug/host
Time (mean ± σ): 67.8 ms ± 2.9 ms [User: 66.9 ms, System: 0.9 ms]
Range (min … max): 65.3 ms … 76.0 ms 40 runs
Benchmark 2: host-newer/target/debug/host
Time (mean ± σ): 382.9 ms ± 19.7 ms [User: 381.9 ms, System: 1.0 ms]
Range (min … max): 369.1 ms … 434.2 ms 10 runs
Summary
host-beta16/target/debug/host ran
5.65 ± 0.38 times faster than host-newer/target/debug/host
This is on x86_64-linux-unknown-gnu
, Rust 1.77.1, Intel i7-6700K CPU. Again note that wasmi and wasmi_core are compiled with optimizations in the debug profile. Removing the opt-level = 2
lines from the respective Cargo.toml files makes both programs much slower (both take ca. 1.7s on my machine). Building them with --release
instead makes them perform the same, but that's of little use to me if I can't figure out how to get the same performance without building my entire project in release mode. I've tried various tweaks to the profile overrides, without success. I've also tried profiling, but all I can see is that 99% of the time is spent in wasmi's interpreter loop.