-
Notifications
You must be signed in to change notification settings - Fork 10
Description
- What are the instructions being proposed?
Non-quasi, guaranteed-fused FMA:
- f32x4.fma
- f32x4.fms
- f64x2.fma
- f64x2.fms
This proposal is to add these in addition to qfma
, not instead of it.
- What are the semantics of these instructions?
IEEE 754 fusedMultiplyAdd
, and obvious subtract variant, with modifications to NaN and exception behavior as in other floating-point instructions in wasm.
- How will these instructions be implemented? Give examples for at least
x86-64 and ARM64. Also provide reference implementation in terms of 128-bit
Wasm SIMD.
On x86-64 CPUs with FMA3 or FMA4, and ARM64, and other popular architectures, there is a single instruction that does this. On CPUs without an fma instruction, some options are discussed here.
- How does behavior differ across processors? What new fingerprinting surfaces will be exposed?
Since wasm hides floating-point exception flags, and NaN bits are already nondeterministic, the only new differences across platforms are timings.
- What use cases are there?
Some floating-point algorithms depend on a true fma, which is a different use case from qfma. And, some use cases want to be able to specify determinism in the wasm module, independently of whether the host implementation is enforcing determinism.
As discussed here, it seems to make more sense to add explicit instructions for these use cases, rather than using profiles to restrict qfma to work for these use cases.
I expect one of the big questions is whether these instructions belong in relaxed-simd or should go in a separate proposal. I'm open to suggestions here. I'm starting by proposing them here, because I expect it would be confusing to users if relaxed-simd is standardized with qfma before a true fma is standardized. If a user needs a true fma, they might be tempted to use qfma if they don't (think they) care about CPUs without fma support.