Performance regression since `v0.32-beta.16` for `debug` builds with profile overwrites

First off, thank you for wasmi and congrats on the recent 0.32 release! I recently swapped out wasmtime for wasmi 0.32-beta.16 in a certain project (not yet public) and was quite happy with it. The build got faster and smaller, the project got more portable, and wasmi's balance of startup latency + wasm execution speed worked much better for that project than wasmtime's (even with winch, which was already an improvement over cranelift). Many tests in that project compile a medium-sized wasm module and run it for short but nontrivial amount of time, and switching to wasmi v0.32.0-beta.16 made those tests run faster.

Unfortunately and surprisingly, when I tried to update to v0.32.0-beta.18 and later to v0.32.0, I found that it got 5x to 6x slower in the configuration I care about the most: building my project in the dev/test profile but enabling optimizations for wasmi and wasmi_core (via [profile overrides](https://doc.rust-lang.org/cargo/reference/profiles.html#overrides)). I've managed to minimize it down to a 1 KiB wasm module and a fairly trivial embedding: [wasmi-slow-repro.tar.gz](https://github.com/wasmi-labs/wasmi/files/15500943/wasmi-slow-repro.tar.gz). In that tarball:

- The compiled wasm module is included for completeness but ought to be reproducible
- The two `host-*` crates do the same thing with different wasmi versions: instantiate the guest and run its sole export
- `compare.sh` builds everything and runs it through hyperfine

I would expect the performance to be the same for both wasmi versions, but in the dev profile (as exercised by the script) it differs:

```
Benchmark 1: host-beta16/target/debug/host
  Time (mean ± σ):      67.8 ms ±   2.9 ms    [User: 66.9 ms, System: 0.9 ms]
  Range (min … max):    65.3 ms …  76.0 ms    40 runs

Benchmark 2: host-newer/target/debug/host
  Time (mean ± σ):     382.9 ms ±  19.7 ms    [User: 381.9 ms, System: 1.0 ms]
  Range (min … max):   369.1 ms … 434.2 ms    10 runs

Summary
  host-beta16/target/debug/host ran
    5.65 ± 0.38 times faster than host-newer/target/debug/host
```

This is on `x86_64-linux-unknown-gnu`, Rust 1.77.1, Intel i7-6700K CPU. Again note that wasmi and wasmi_core *are* compiled with optimizations in the debug profile. Removing the `opt-level = 2` lines from the respective Cargo.toml files makes both programs much slower (both take ca. 1.7s on my machine). Building them with `--release` instead makes them perform the same, but that's of little use to me if I can't figure out how to get the same performance without building my entire project in release mode. I've tried various tweaks to the profile overrides, without success. I've also tried profiling, but all I can see is that 99% of the time is spent in wasmi's interpreter loop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance regression since `v0.32-beta.16` for `debug` builds with profile overwrites #1048

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance regression since v0.32-beta.16 for debug builds with profile overwrites #1048

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Performance regression since `v0.32-beta.16` for `debug` builds with profile overwrites #1048