Skip to content

Support variable-index swizzles #226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
reinerp opened this issue Jan 17, 2022 · 9 comments
Closed

Support variable-index swizzles #226

reinerp opened this issue Jan 17, 2022 · 9 comments
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR

Comments

@reinerp
Copy link

reinerp commented Jan 17, 2022

Both x86-SSE and ARM-NEON provide byte-level shuffle instructions with indices coming from a register: pshufb on x86 and the tbl instructions on ARM. The core::simd API exposes these for constant indices via the simd_swizzle macro, but as far as I can tell, there's no support for variable indices exposed by core::simd. It would be valuable to provide this, either just for u8 vectors, or perhaps for all vectors?

Variable-index swizzles can be very valuable in some scenarios. (1) Often they are used as table lookup, e.g. pshufb provides a 16-way-parallel table lookup from a 16-element table of bytes. (2) Often in compression/filtering/sorting scenarios the shuffle needs to be computed based on a dynamic calculation.

Here's one potential challenge and one potential solution. x86-SSE and ARM-NEON have matching semantics on in-range indices (select the indexed byte), but have different semantics on out-of-range indices: ARM-NEON returns zero on any out-of-range index, whereas x86-SSE returns zero only if the top bit of the byte is set. The choice taken by wasm-simd, for example, is ARM semantics: https://github.com/WebAssembly/simd/blob/main/proposals/simd/SIMD.md#swizzling-using-variable-indices. Going with those semantics seems like a plausible choice.

@reinerp reinerp added the C-feature-request Category: a feature request, i.e. not implemented / a PR label Jan 17, 2022
@programmerjake
Copy link
Member

iirc llvm doesn't have an architecture independent dynamic swizzle instruction/intrinsic.

@calebzulawski
Copy link
Member

Yeah, LLVM doesn't have an intrinsic for non-const index shuffles. std::simd doesn't have anything (else) preventing it and this is something I'm interested in.

This is hard to implement without an intrinsic, for example: x86-64 doesn't have a shuffle instruction unless at least SSSE3 is available, but std must work all the way down to base the base architecture.

Maybe LLVM could relocate the wasm shuffle lowering code to an LLVM intrinsic, and then we would be able to take advantage of that.

@Lokathor
Copy link
Contributor

That there isn't a LLVM intrinsic for this isn't really a blocker, as a fallback the operation can be easily written by casting to an array of bytes, making a new array with a bunch of indexing, and casting back.

The SSSE3 thing also isn't a big deal, since it's easy to do the conditional compilation, and the function would inevitably be generic and/or marked as inline anyway.

@calebzulawski
Copy link
Member

Conditional compilation doesn't really work--even if the function is inline, std is compiled without any features.

@programmerjake
Copy link
Member

see additional discussion in #11

@Lokathor
Copy link
Contributor

Lokathor commented Jan 17, 2022

oh crap right because cfg is pre codegen

@RalfJung
Copy link
Member

Duplicated by #242?

@programmerjake
Copy link
Member

#242 is more detailed and has step-by-step task lists, so imho this issue should be closed in favor of #242.

@calebzulawski
Copy link
Member

No point having two issues open, closing this one in favor of that one.

@calebzulawski calebzulawski closed this as not planned Won't fix, can't repro, duplicate, stale May 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR
Projects
None yet
Development

No branches or pull requests

5 participants