Avx512f #921

minybot · 2020-09-30T22:23:14Z

alignr: epi32,epi64
zextps128_ps512,zextps256_ps512,zextpd128_pd512,zextpd256_pd512,zextsi128_si512,zextsi256_si512
undefined: epi32, (); set_zero: epi32, ()
set_epi8, set_epi16, set1_epi8, set1_epi16
set4: epi32,epi64,ps,pd; setr4: epi32,epi64,ps,pd
cvtepi8_epi32, cvtepi8_epi64, cvtepu8_epi32, cvtepu8_epi64
cvtepi16_epi32, cvtepi16_epi64, cvtepu16_epi32, cvtepu16_epi64
cvtepi32_epi64, cvtepu32_epi64, cvtepi32_ps, cvtepi32_pd
cvtepu32_ps, cvtepu32_pd, cvtepi32lo_pd, cvtepu32lo_pd
cvtepi32_epi16, cvtepi32_epi8,
cvtepi64_epi32, cvtepi64_epi16, cvtepi64_epi8
cvtsepi32_epi16, cvtsepi32_epi8
cvtsepi64_epi32, cvtsepi64_epi16, cvtsepi64_epi8
cvtusepi32_epi16, cvtusepi32_epi8, cvtusepi64_epi32, cvtusepi64_epi16, cvtusepi64_epi8
cvtpd_ps, cvt_roundpd_ps
cvtpd_pslo, cvtpslo_pd
cvt_roundpd_epi32, cvt_roundpd_epu32
cvt_roundepi32_ps, cvt_roundepu32_ps
cvt_roundps_ph, cvtps_ph
cvt_roundph_ps, cvtph_ps
reduce_add: epi32,ps,pd
reduce_mul: epi32,ps,pd
reduce_max: epi32,epu32,ps,pd
reduce_min: epi32,epu32,ps,pd
reduce_and: epi32
loadu: epi32,epi64,si512; storeu: epi32,epi64,si512
load: epi32,epi64,si512,ps,pd; store: epi32,epi64,si512,ps,pd
extractf32x4_ps, extractf64x4_pd, extracti32x4_epi32, extracti64x4_epi64
reduce_or: epi32
compress: epi32,epi64,ps,pd
expand: epi32,epi64,ps,pd

merge

merge from base

merge base

…i128_si512,zextsi256_si512

…, cvtusepi64_epi8

…pu32,pspd

rust-highfive · 2020-09-30T22:23:18Z

r? @Amanieu

(rust_highfive has picked a reviewer for you, use r? to override)

Amanieu · 2020-10-02T00:41:17Z

crates/core_arch/src/x86_64/avx512f.rs

+#[target_feature(enable = "avx512f")]
+pub unsafe fn _mm512_setr4_epi64(d: i64, c: i64, b: i64, a: i64) -> __m512i {
+    _mm512_set_epi64(a, b, c, d, a, b, c, d)
+}


Why are these only available on x86_64?

Why are these only available on x86_64?
I put in x86_64 because _mm512_set_epi64 is put in x86_64.

I don't think any of the SSE/AVX intrinsics are specific to x86_64, they should all work on x86.

When I move set4 to x86, it show:
---- verify_all_signatures stdout ----
failed to verify _mm512_set4_epi64

intrinsic _mm512_set4_epi64 uses a 64-bit bare type but may be available on 32-bit platforms
failed to verify _mm512_setr4_epi64

intrinsic _mm512_setr4_epi64 uses a 64-bit bare type but may be available on 32-bit platforms
thread 'verify_all_signatures' panicked at 'assertion failed: all_valid', crates/stdarch-verify/tests/x86-intel.rs:362:5

You can add them to the whitelist in crates/stdarch-verify/tests/x86-intel.rs.

All the set intrinsics should be added to the whitelist and moved to x86.

All the set intrinsics should be added to the whitelist and moved to x86.

How about _mm512_set_epi64 and _mm512_setr_epi64?

Amanieu · 2020-10-03T19:33:26Z

crates/core_arch/src/x86/avx512f.rs

+        _mm512_setzero_si512().as_i32x16(),
+        k,
+    ));
+    ptr::write_unaligned(mem_addr as *mut __m512i, r);


This is incorrect: the intrinsic is only supposed to store the elements selected by the mask, not the full 512 bits. This is done with a special LLVM intrinsic: https://godbolt.org/z/vWf7KE

This is incorrect: the intrinsic is only supposed to store the elements selected by the mask, not the full 512 bits. This is done with a special LLVM intrinsic: https://godbolt.org/z/vWf7KE

Yes, I tried to implement use llvm.masked.compressstore.v16f32, but whatever orders for the three parameters. It says the link error because of bad parameters.(I tried 3x2x1 different orders)
Also, compress_ps and compressstoreu_ps use the same Instruction: vcompressps.

The memory and register forms of the instruction behave differently: https://www.felixcloutier.com/x86/vcompressps

Memory destination version: Only the contiguous vector is written to the destination memory location. EVEX.z must be zero.

Register destination version: If the vector length of the contiguous vector is less than that of the input vector in the source operand, the upper bits of the destination register are unmodified if EVEX.z is not set, otherwise the upper bits are zeroed.

I tried to follow llvm document "declare void @llvm.masked.compressstore.v16f32(<16 x float> , float* , <16 x i1> )" to implement.
#[link_name = "llvm.masked.compressstore.v16f32"]
fn vcompresspss(a: f32x16, p: mut f32, mask: i16);
However, when I compiled it. It shows
"Intrinsic has incorrect argument type
void (<16 x float>, float, i16)* @llvm.masked.compressstore.v16f32"

Any clue to solve this?

I think this will require special support in the compiler. We don't currently have a way to express vectors of i1.

I think this will require special support in the compiler. We don't currently have a way to express vectors of i1.
True, I think I will drop off these functions first.

Amanieu · 2020-10-03T19:33:41Z

crates/core_arch/src/x86/avx512f.rs

+    k: __mmask16,
+    mem_addr: *const i32,
+) -> __m512i {
+    let load = ptr::read_unaligned(mem_addr as *const __m512i);


You also need to remove expandloadu since it reads more than it should, which could cause out-of-bounds reads.

expandloadu is copy the value from src when mask is false.
m := 0
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
m := m + 32
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR

Yes, @llvm.masked.expandload.v16f32(float* %{{.}}, <16 x i1> %{{.}}, <16 x float> %{{.*}})
I will remove it.

Btw, I thought expandloadu_ps = loadu_ps + expand_ps because it has the similar latency.
latency 11 = 4 + 8. It seems expandloadu_ps is a little faster.

minybot added 30 commits September 4, 2020 20:36

Merge pull request #1 from rust-lang/master

731dc70

merge

Merge pull request #3 from rust-lang/master

72ff3d7

merge from base

Merge pull request #4 from rust-lang/master

bcc2a2c

merge base

Merge remote-tracking branch 'upstream/master'

9b729dc

Merge remote-tracking branch 'upstream/master'

123faf2

Merge remote-tracking branch 'upstream/master'

6e9ddb1

alignr: epi32

ea96fbe

alignr: epi64

138d120

zextps128_ps512,zextps256_ps512,zextpd128_pd512,zextpd256_pd512,zexts…

c9ec754

…i128_si512,zextsi256_si512

undefined: epi32, (); set_zero: epi32, ()

e401a5c

set_epi8, set_epi16, set1_epi8, set1_epi16

0102643

set4: epi32,epi64,ps,pd; setr4: epi32,epi64,ps,pd

cc431fc

cvtepi8_epi32, cvtepi8_epi64

716fc71

cvtepu8_epi32, cvtepu8_epi64

a711339

cvtepi16_epi32, cvtepi16_epi64, cvtepu16_epi32, cvtepu16_epi64

340cc20

cvtepi32_epi64, cvtepu32_epi64

ba1a74e

cvtepi32_ps, cvtepi32_pd

4d39915

cvtepu32_ps, cvtepu32_pd, cvtepi32lo_pd, cvtepu32lo_pd

818cf49

cvtepi32_epi16, cvtepi32_epi8

f2c228a

cvtepi64_epi32, cvtepi64_epi16, cvtepi64_epi8

74388f2

cvtsepi32_epi16, cvtsepi32_epi8

607cbb6

cvtsepi64_epi32, cvtsepi64_epi16, cvtsepi64_epi8

fdbecb8

cvtusepi32_epi16, cvtusepi32_epi8, cvtusepi64_epi32, cvtusepi64_epi16…

ee3b697

…, cvtusepi64_epi8

cvtpd_ps, cvt_roundpd_ps

86efbf1

cvtpd_pslo, cvtpslo_pd

446840e

cvt_roundpd_epi32, cvt_roundpd_epu32

10209d7

cvt_roundepi32_ps, cvt_roundepu32_ps

3cfb0c1

cvt_roundps_ph, cvtps_ph

4946f52

cvt_roundph_ps, cvtph_ps

52be154

add_reduce_epi32

b4ca4e6

minybot added 2 commits September 30, 2020 20:30

reduce_add: epi32,ps,pd; reduce_mul: epi32,ps,pd; reduce_max: epi32,e…

8ee05b7

…pu32,pspd

reduce_min: epi32,epu32,ps,pd; reduce_and: epi32; reduce_or: epi32

2e2119f

rust-highfive assigned Amanieu Sep 30, 2020

minybot added 10 commits October 1, 2020 14:09

mask_extractf32x4_ps

b563b47

fix mask_extractf32x4_ps attr test

07ea5aa

maskz_extractf32x4_ps

003a4de

extractf64x4_ps, extracti32x4_epi32, extracti64x4_epi64

75273a4

mask_compress: epi32,epi64,ps,pd

ad35d10

mask_expand: epi32,epi64,ps,pd

8463476

storeu: epi32,epi64,si512; loadu: epi32,epi64,si512

197eda9

store: epi32,epi64,si512,ps,pd; load: epi32,epi64,si512,ps,pd

40daa51

compressstoreu: epi32,epi64,ps,pd

88f05a6

expandloadu: epi32,epi64,ps,pd

702497a

Amanieu reviewed Oct 3, 2020

View reviewed changes

minybot added 6 commits October 4, 2020 11:49

remove compressstoreu: epi32,epi64,ps,pd

7a6e741

remove expandloadu: epi32,epi64,ps,pd

1d7afa6

move set_epi64, setr_epi64 to x86

1311982

fix reduce_add_epi64

d23e5d8

test new simd_select_bitmask

fc94200

update code to use the new simd_select_bitmask

eae4ac5

Amanieu merged commit 7b92756 into rust-lang:master Oct 10, 2020

minybot deleted the avx512f branch October 11, 2020 00:09

Avx512f #921

Avx512f #921

Uh oh!

Conversation

minybot commented Sep 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Sep 30, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

minybot commented Sep 30, 2020 •

edited

Loading