sse4.1 instructions #98

p32blo · 2017-10-06T09:06:24Z

I have some doubts over several aspects of the PR so I will add inline comments.

- HACK warning: messing with the constify macros - Selecting only one buffer gets optimized away and tests need to take this into account

p32blo · 2017-10-06T09:13:08Z

src/x86/sse41.rs

+#[inline(always)]
+#[target_feature = "+sse4.1"]
+#[cfg_attr(test, assert_instr(pblendw, imm8=0xF0))]
+pub unsafe fn _mm_blend_epi16(a: i16x8, b: i16x8, imm8: u8) -> i16x8 {


Should the imm8 be u8 or i32? The public interface in C is const int, but I followed _mm_dp_ps for consistency and used u8.

I'd imagine that so long as more than 8 bits aren't read then passing u8 here seems fine to me, but @BurntSushi do you think we should super strictly follow C?

p32blo · 2017-10-06T09:19:59Z

src/x86/sse41.rs

+    macro_rules! call {
+        ($imm2:expr) => { blendpd(a, b, $imm2) }
+    }
+    constify_imm2!(imm2, call)


I don't think this is the correct way, since I'm messing with macros, but it was the most obvious way to make it work for now.

p32blo · 2017-10-06T09:24:09Z

src/x86/sse41.rs

+#[inline(always)]
+#[target_feature = "+sse4.1"]
+#[cfg_attr(test, assert_instr(extractps, imm8=0))]
+pub unsafe fn _mm_extract_ps(a: f32x4, imm8: u8) -> i32 {


This is again a question of how much should C be followed.

This intrinsic returns somewhat nonsensically an i32:

int _mm_extract_ps (__m128 a, const int imm8)

Should we just return an f32 or continue to use the transmute here.

To me it seems ok to return f32 here

Actually, it really should be an i32. See answers at https://stackoverflow.com/questions/5526658/intel-sse-why-does-mm-extract-ps-return-int-instead-of-float

p32blo · 2017-10-06T09:30:30Z

src/x86/sse41.rs

+#[inline(always)]
+#[target_feature = "+sse4.1"]
+#[cfg_attr(test, assert_instr(pextrb, imm8=0))]
+pub unsafe fn _mm_extract_epi8(a: i8x16, imm8: u8) -> i32 {


Some as the previous one, but should it be i8 instead of i32?

- avx -> sse4.1

alexcrichton · 2017-10-09T21:01:00Z

Looks like CI may still be failing?

p32blo · 2017-10-10T10:55:49Z

Some context:

Travis is failing with some avx tests. They work on my machine so I don't know what to do to fix this. Also can't get why AppVeyor is failing.
Couldn't change _mm_extract_ps to return f32 because it does not generate the correct instruction with the change.
Changed _mm_extract_epi64 and _mm_insert_epi64 to only work on x86_64 since it failed to generate the instruction on 32-bit builds. See here.
used i8x16 instead of __m128i on _mm_blendv_epi8 for consistency.

nominolo · 2017-10-17T10:05:21Z

Win64 uses a different calling convention. Unless you define a different calling convention, it will pass vector arguments in memory instead of via the XMM/YMM registers. This causes different code to be generated. See, e.g., https://msdn.microsoft.com/en-us/library/9z1stfyw.aspx

p32blo · 2017-10-17T10:30:01Z

Thanks @nominolo !

Is there a way I can fix this?

nominolo · 2017-10-17T11:52:12Z

@p32blo You probably want to just change the expected code. I.e., something like: https://github.com/rust-lang-nursery/stdsimd/blob/2dbe8d0b39a1b07e9cf4846d593908bed588ce9f/src/x86/sse.rs#L751

This reverts commit 12936e9.

This reverts commit 9c47380.

p32blo · 2017-10-17T16:15:45Z

src/x86/sse41.rs

+#[cfg(target_arch = "x86_64")]
+#[inline(always)]
+#[target_feature = "+sse4.1"]
+#[cfg_attr(all(test, windows), assert_instr(mov, imm8=1))]


Should Windows be tested at all, since it does not generate the correct output? Or is testing for mov ok?

I think testing for plain mov doesn't make sense, because it'll always be satisfied by mov rbp,rsp which is part of every function prelude.

alexcrichton · 2017-10-18T15:34:49Z

Ok looks great, thanks! We can always work on adding more tests later if necessary.

* sse4.1: _mm_blendv_ps and _mm_blendv_pd * sse4.1: _mm_blend_ps and _mm_blend_pd - HACK warning: messing with the constify macros - Selecting only one buffer gets optimized away and tests need to take this into account * sse4.1: _mm_blend_epi16 * sse4.1: _mm_extract_ps * sse4.1: _mm_extract_epi8 * see4.1: _mm_extract_epi32 * sse4.1: _mm_extract_epi64 * sse4.1: _mm_insert_ps * sse4.1: _mm_insert_epi8 * sse4.1: _mm_insert_epi32 and _mm_insert_epi64 * Formmating * sse4.1: _mm_max_epi8, _mm_max_epu16, _mm_max_epi32 and _mm_max_epu32 * Fix wrong compiler flag - avx -> sse4.1 * Fix intrinsics that only work with x86-64 * sse4.1: use appropriate types * Revert '_mm_extract_ps' to return i32 * sse4.1: Use the v128 types for consistency * Try fix for windows * Try "vectorcall" calling convention * Revert "Try "vectorcall" calling convention" This reverts commit 12936e9. * Revert "Try fix for windows" This reverts commit 9c47380. * Change tests for windows * Remove useless Windows test

p32blo added 12 commits October 6, 2017 09:34

sse4.1: _mm_blendv_ps and _mm_blendv_pd

4350feb

sse4.1: _mm_blend_ps and _mm_blend_pd

9e015df

- HACK warning: messing with the constify macros - Selecting only one buffer gets optimized away and tests need to take this into account

sse4.1: _mm_blend_epi16

ec25f71

sse4.1: _mm_extract_ps

3ea44a3

sse4.1: _mm_extract_epi8

88f3992

see4.1: _mm_extract_epi32

38981a8

sse4.1: _mm_extract_epi64

aa1f042

sse4.1: _mm_insert_ps

6c46075

sse4.1: _mm_insert_epi8

fd1506e

sse4.1: _mm_insert_epi32 and _mm_insert_epi64

1375861

Formmating

e5dab3a

sse4.1: _mm_max_epi8, _mm_max_epu16, _mm_max_epi32 and _mm_max_epu32

08574d8

p32blo commented Oct 6, 2017

View reviewed changes

p32blo added 2 commits October 6, 2017 11:45

Fix wrong compiler flag

7fe0345

- avx -> sse4.1

Fix intrinsics that only work with x86-64

60b1156

p32blo changed the title ~~sse4. 1 instructions~~ sse4.1 instructions Oct 6, 2017

p32blo added 2 commits October 9, 2017 10:22

Merge remote-tracking branch 'up/master' into sse4.1

a789f7e

sse4.1: use appropriate types

2373618

p32blo added 3 commits October 10, 2017 09:33

Revert '_mm_extract_ps' to return i32

b80b3a6

Merge remote-tracking branch 'up/master' into sse4.1

91b079a

sse4.1: Use the v128 types for consistency

bab5cd3

p32blo added 2 commits October 16, 2017 10:46

Merge remote-tracking branch 'up/master' into sse4.1

cf86d8a

Try fix for windows

9c47380

Try "vectorcall" calling convention

12936e9

p32blo force-pushed the sse4.1 branch from 9c47380 to 1597db7 Compare October 17, 2017 13:51

p32blo added 2 commits October 17, 2017 14:58

Revert "Try "vectorcall" calling convention"

11d745c

This reverts commit 12936e9.

Revert "Try fix for windows"

adc7abc

This reverts commit 9c47380.

p32blo force-pushed the sse4.1 branch from 1597db7 to 8c5d2a2 Compare October 17, 2017 14:03

Change tests for windows

5456fbf

p32blo force-pushed the sse4.1 branch from 8c5d2a2 to 9c47380 Compare October 17, 2017 14:10

Merge remote-tracking branch 'up/master' into sse4.1

50418db

p32blo commented Oct 17, 2017

View reviewed changes

Remove useless Windows test

64f6146

alexcrichton merged commit ac1e68a into rust-lang:master Oct 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sse4.1 instructions #98

sse4.1 instructions #98

Uh oh!

p32blo commented Oct 6, 2017

Uh oh!

p32blo Oct 6, 2017

Uh oh!

alexcrichton Oct 6, 2017

Uh oh!

p32blo Oct 6, 2017

Uh oh!

p32blo Oct 6, 2017

Uh oh!

alexcrichton Oct 6, 2017

Uh oh!

nominolo Oct 18, 2017

Uh oh!

p32blo Oct 6, 2017

Uh oh!

alexcrichton commented Oct 9, 2017

Uh oh!

p32blo commented Oct 10, 2017

Uh oh!

nominolo commented Oct 17, 2017

Uh oh!

p32blo commented Oct 17, 2017

Uh oh!

nominolo commented Oct 17, 2017

Uh oh!

p32blo Oct 17, 2017 •

edited

Loading

Uh oh!

nominolo Oct 17, 2017

Uh oh!

alexcrichton commented Oct 18, 2017

Uh oh!

Uh oh!

sse4.1 instructions #98

sse4.1 instructions #98

Uh oh!

Conversation

p32blo commented Oct 6, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Oct 9, 2017

Uh oh!

p32blo commented Oct 10, 2017

Uh oh!

nominolo commented Oct 17, 2017

Uh oh!

p32blo commented Oct 17, 2017

Uh oh!

nominolo commented Oct 17, 2017

Uh oh!

p32blo Oct 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Oct 18, 2017

Uh oh!

Uh oh!

p32blo Oct 17, 2017 •

edited

Loading