-
Notifications
You must be signed in to change notification settings - Fork 88
Support variable-index swizzles #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
iirc llvm doesn't have an architecture independent dynamic swizzle instruction/intrinsic. |
Yeah, LLVM doesn't have an intrinsic for non-const index shuffles. This is hard to implement without an intrinsic, for example: x86-64 doesn't have a shuffle instruction unless at least SSSE3 is available, but Maybe LLVM could relocate the wasm shuffle lowering code to an LLVM intrinsic, and then we would be able to take advantage of that. |
That there isn't a LLVM intrinsic for this isn't really a blocker, as a fallback the operation can be easily written by casting to an array of bytes, making a new array with a bunch of indexing, and casting back. The SSSE3 thing also isn't a big deal, since it's easy to do the conditional compilation, and the function would inevitably be generic and/or marked as |
Conditional compilation doesn't really work--even if the function is inline, |
see additional discussion in #11 |
oh crap right because cfg is pre codegen |
Duplicated by #242? |
No point having two issues open, closing this one in favor of that one. |
Both x86-SSE and ARM-NEON provide byte-level shuffle instructions with indices coming from a register:
pshufb
on x86 and thetbl
instructions on ARM. The core::simd API exposes these for constant indices via thesimd_swizzle
macro, but as far as I can tell, there's no support for variable indices exposed by core::simd. It would be valuable to provide this, either just for u8 vectors, or perhaps for all vectors?Variable-index swizzles can be very valuable in some scenarios. (1) Often they are used as table lookup, e.g. pshufb provides a 16-way-parallel table lookup from a 16-element table of bytes. (2) Often in compression/filtering/sorting scenarios the shuffle needs to be computed based on a dynamic calculation.
Here's one potential challenge and one potential solution. x86-SSE and ARM-NEON have matching semantics on in-range indices (select the indexed byte), but have different semantics on out-of-range indices: ARM-NEON returns zero on any out-of-range index, whereas x86-SSE returns zero only if the top bit of the byte is set. The choice taken by wasm-simd, for example, is ARM semantics: https://github.com/WebAssembly/simd/blob/main/proposals/simd/SIMD.md#swizzling-using-variable-indices. Going with those semantics seems like a plausible choice.
The text was updated successfully, but these errors were encountered: