This was found while building rustc 1.33.0 on `powerpc64-unknown-linux-musl`. Test `run-pass/simd/simd-intrinsic-generic-select` fails at the third `simd_select_bitmask` test: ``` thread 'main' panicked at 'assertion failed: `(left == right)` left: `u32x8(8, 1, 10, 3, 12, 5, 14, 7)`, right: `u32x8(0, 9, 2, 11, 4, 13, 6, 15)`', src/test/run-pass/simd/simd-intrinsic-generic-select.rs:159:9 ``` The LLVM IR generated for this test case on `x86_64-unknown-linux-musl` and `powerpc64-unknown-linux-musl` (respectively) is identical: ```llvm %1036 = load <8 x i32>, <8 x i32>* %a83, align 32 %1037 = load <8 x i32>, <8 x i32>* %b84, align 32 %1038 = select <8 x i1> bitcast (<1 x i8> <i8 -16> to <8 x i1>), <8 x i32> %1036, <8 x i32> %1037 store <8 x i32> %1038, <8 x i32>* %r101, align 32 ``` ```llvm %1036 = load <8 x i32>, <8 x i32>* %a83, align 32 %1037 = load <8 x i32>, <8 x i32>* %b84, align 32 %1038 = select <8 x i1> bitcast (<1 x i8> <i8 -16> to <8 x i1>), <8 x i32> %1036, <8 x i32> %1037 store <8 x i32> %1038, <8 x i32>* %r101, align 32 ``` The test appears to expect that the bitmask is interpreted as bitwise little-endian (in other words, the LSB selects the first element from the vectors). However, the implementation uses a bitcast to a vector of `i1`. On big-endian architectures such as powerpc64, the LSB becomes the last element of this `i1` vector, not the first. Unfortunately, all of the upstream test cases are symmetrical, so "choosing the wrong vector" and "reading the bitmask backwards" are indistinguishable. I added an additional test case: ```rust let r: u32x8 = simd_select_bitmask(0b11110101u8, a, b); let e = u32x8(0, 9, 2, 11, 4, 5, 6, 7); assert_eq!(r, e); ``` This passes on `x86_64-unknown-linux-musl`, but fails on `powerpc64-unknown-linux-musl` with: ``` thread 'main' panicked at 'assertion failed: `(left == right)` left: `u32x8(0, 1, 2, 3, 12, 5, 14, 7)`, right: `u32x8(0, 9, 2, 11, 4, 5, 6, 7)`', src/test/run-pass/simd/simd-intrinsic-generic-select.rs:50:9 ``` The two "unlike" elements were chosen from the wrong place in the vector. Since the vectors are 256 bits, and the POWER VSX registers are only 128 bits wide, LLVM must split each vector across two registers. The following powerpc64 assembly is from the last test case (`0b11110000u8`), and clearly shows that LLVM (not the hardware) picks the first half of `a` and the second half of `b`: ``` 0x000000000000416c <+36>: li r3,0 36 unsafe { 37 let a = u32x8(0, 1, 2, 3, 4, 5, 6, 7); 0x0000000000004170 <+40>: stw r3,288(r1) 0x0000000000004174 <+44>: li r3,1 0x0000000000004178 <+48>: stw r3,292(r1) 0x000000000000417c <+52>: li r3,2 0x0000000000004180 <+56>: stw r3,296(r1) 0x0000000000004184 <+60>: li r3,3 0x0000000000004188 <+64>: stw r3,300(r1) 0x000000000000418c <+68>: li r3,4 0x0000000000004190 <+72>: stw r3,304(r1) 0x0000000000004194 <+76>: li r3,5 0x0000000000004198 <+80>: stw r3,308(r1) 0x000000000000419c <+84>: li r3,6 0x00000000000041a0 <+88>: stw r3,312(r1) 0x00000000000041a4 <+92>: li r3,7 0x00000000000041a8 <+96>: stw r3,316(r1) 0x00000000000041ac <+100>: li r3,8 38 let b = u32x8(8, 9, 10, 11, 12, 13, 14, 15); 0x00000000000041b0 <+104>: stw r3,320(r1) 0x00000000000041b4 <+108>: li r3,9 0x00000000000041b8 <+112>: stw r3,324(r1) 0x00000000000041bc <+116>: li r3,10 0x00000000000041c0 <+120>: stw r3,328(r1) 0x00000000000041c4 <+124>: li r3,11 0x00000000000041c8 <+128>: stw r3,332(r1) 0x00000000000041cc <+132>: addi r3,r1,336 0x00000000000041d0 <+136>: li r4,12 0x00000000000041d4 <+140>: stw r4,336(r1) 0x00000000000041d8 <+144>: li r4,13 0x00000000000041dc <+148>: stw r4,340(r1) 0x00000000000041e0 <+152>: li r4,14 0x00000000000041e4 <+156>: stw r4,344(r1) 0x00000000000041e8 <+160>: li r4,15 0x00000000000041ec <+164>: stw r4,348(r1) ... 60 let r: u32x8 = simd_select_bitmask(0b11110000u8, a, b); 0x00000000000048cc <+1924>: addi r3,r1,288 0x00000000000048d0 <+1928>: lvx v2,0,r3 0x00000000000048d4 <+1932>: addi r3,r1,336 0x00000000000048d8 <+1936>: lvx v3,0,r3 0x00000000000048dc <+1940>: addi r3,r1,1488 0x00000000000048e0 <+1944>: stvx v3,0,r3 0x00000000000048e4 <+1948>: addi r3,r1,1472 0x00000000000048e8 <+1952>: stvx v2,0,r3 ``` So is this a bug in the test, because it should be ensuring that the bitmask is in native vector/endian order? Or in the implementation of `simd_select_bitmask`, because it should always take a little-endian bitmask and reverse the bits as necessary?