Add v8x16.shuffle1 instruction (#71)

zeux · dtig · commit 7f4d54d4c9a5 · 2019-03-27T10:10:01.000-07:00
This change adds a variable shuffle instruction to SIMD proposal. When indices are out of range, the result is specified as 0 for each lane. This matches hardware behavior on ARM and RISCV architectures. On x86_64 and MIPS, the hardware provides instructions that can select 0 when the high bit is set to 1 (x86_64) or any of the two high bits are set to 1 (MIPS). On these architectures, the backend is expected to emit a pair of instructions, saturating add (saturate(x + (128 - 16)) for x86_64) and permute, to emulate the proposed behavior. To distinguish variable shuffles with immediate shuffles, existing v8x16.shuffle instruction is renamed to v8x16.shuffle2_imm to be explicit about the fact that it shuffles two vectors with an immediate argument. This naming scheme allows for adding variants like v8x16.shuffle2 and v8x16.shuffle1_imm in the future. Fixes #68. Contributes to #24. Fixes #11.
diff --git a/proposals/simd/BinarySIMD.md b/proposals/simd/BinarySIMD.md
@@ -23,14 +23,13 @@ instr ::= ...
 ```
 
 Some SIMD instructions have additional immediate operands following `simdop`.
-The `v8x16.shuffle` instruction has 16 bytes after `simdop`.
+The `v8x16.shuffle2_imm` instruction has 16 bytes after `simdop`.
 
 | Instruction               | `simdop` | Immediate operands |
 | --------------------------|---------:|--------------------|
 | `v128.load`               |    `0x00`| m:memarg           |
 | `v128.store`              |    `0x01`| m:memarg           |
 | `v128.const`              |    `0x02`| i:ImmByte[16]      |
-| `v8x16.shuffle`           |    `0x03`| s:LaneIdx32[16]    |
 | `i8x16.splat`             |    `0x04`| -                  |
 | `i8x16.extract_lane_s`    |    `0x05`| i:LaneIdx16        |
 | `i8x16.extract_lane_u`    |    `0x06`| i:LaneIdx16        |
@@ -167,3 +166,5 @@ The `v8x16.shuffle` instruction has 16 bytes after `simdop`.
 | `f32x4.convert_u/i32x4`   |    `0xb0`| -                  |
 | `f64x2.convert_s/i64x2`   |    `0xb1`| -                  |
 | `f64x2.convert_u/i64x2`   |    `0xb2`| -                  |
+| `v8x16.shuffle1`          |    `0xc0`| -                  |
+| `v8x16.shuffle2_imm`      |    `0xc1`| s:LaneIdx32[16]    |
diff --git a/proposals/simd/SIMD.md b/proposals/simd/SIMD.md
@@ -284,8 +284,8 @@ def S.replace_lane(a, i, x):
 The input lane value, `x`, is interpreted the same way as for the splat
 instructions. For the `i8` and `i16` lanes, the high bits of `x` are ignored.
 
-### Shuffle lanes
-* `v8x16.shuffle(a: v128, b: v128, imm: ImmLaneIdx32[16]) -> v128`
+### Shuffling using immediate indices
+* `v8x16.shuffle2_imm(a: v128, b: v128, imm: ImmLaneIdx32[16]) -> v128`
 
 Returns a new vector with lanes selected from the lanes of the two input vectors
 `a` and `b` specified in the 16 byte wide immediate mode operand `imm`. This
@@ -294,7 +294,7 @@ return. The indices `i` in range `[0, 15]` select the `i`-th element of `a`. The
 indices in range `[16, 31]` select the `i - 16`-th element of `b`.
 
 ```python
-def S.shuffle(a, b, s):
+def S.shuffle2_imm(a, b, s):
     result = S.New()
     for i in range(S.Lanes):
         if s[i] < S.lanes:
@@ -304,6 +304,25 @@ def S.shuffle(a, b, s):
     return result
 ```
 
+### Shuffling using variable indices
+* `v8x16.shuffle1(a: v128, s: v128) -> v128`
+
+Returns a new vector with lanes selected from the lanes of the first input
+vector `a` specified in the second input vector `s`. The indices `i` in range
+`[0, 15]` select the `i`-th element of `a`. For indices outside of the range
+the resulting lane is 0.
+
+```python
+def S.shuffle1(a, s):
+    result = S.New()
+    for i in range(S.Lanes):
+        if s[i] < S.lanes:
+            result[i] = a[s[i]]
+        else:
+            result[i] = 0
+    return result
+```
+
 ## Integer arithmetic
 
 Wrapping integer arithmetic discards the high bits of the result.
diff --git a/proposals/simd/TextSIMD.md b/proposals/simd/TextSIMD.md
@@ -20,8 +20,8 @@ The canonical text format used for printing `v128.const` instructions is
 v128.const i32x4 0xNNNNNNNN 0xNNNNNNNN 0xNNNNNNNN 0xNNNNNNNN
 ```
 
-### v8x16.shuffle
+### v8x16.shuffle2_imm
 
 ```
-v8x16.shuffle i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5
+v8x16.shuffle2_imm i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5
 ```