Skip to content

Commit 7759b2f

Browse files
committed
Merge pull request #4099 from juj/sse3
Sse3
2 parents 5b64d4a + 2a94990 commit 7759b2f

File tree

6 files changed

+239
-3
lines changed

6 files changed

+239
-3
lines changed

emcc.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -685,6 +685,11 @@ def validate_arg_level(level_string, max_level, err_msg):
685685
newargs.append('-D__SSE__=1')
686686
newargs.append('-D__SSE2__=1')
687687
newargs[i] = ''
688+
elif newargs[i] == '-msse3':
689+
newargs.append('-D__SSE__=1')
690+
newargs.append('-D__SSE2__=1')
691+
newargs.append('-D__SSE3__=1')
692+
newargs[i] = ''
688693

689694
if should_exit:
690695
sys.exit(0)

site/source/docs/porting/simd.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ There are three different ways to generate code to benefit from SIMD instruction
1717

1818
- Emscripten supports the GCC/Clang compiler specific `SIMD Vector Extensions <https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html>`_. These constructs do not require any changes to the command line build flags, but any code that utilizes the vector built-ins will always unconditionally emit SIMD.js vector instructions.
1919

20-
- A third option is to use the x86 SSE intrinsics. Emscripten has full support for compiling code that utilizes the SSE1 and SSE2 intrinsic function calls. To enable SSE1 intrinsics support, pass the compiler flag -msse, and add in a #include <xmmintrin.h>. To build SSE2 intrinsics code, pass the compiler flag -msse2, and use #include <emmintrin.h>.
20+
- A third option is to use the x86 SSE intrinsics. Emscripten has full support for compiling code that utilizes the SSE1, SSE2 and SSE3 intrinsic function calls. To enable SSE1 intrinsics support, pass the compiler flag -msse, and add in a #include <xmmintrin.h>. To build SSE2 intrinsics code, pass the compiler flag -msse2, and use #include <emmintrin.h>. For SSE3, pass -msse3 and #include <pmmintrin.h>.
2121

2222
These three methods are not mutually exclusive, but may freely be combined.
2323

@@ -30,9 +30,9 @@ When porting native SIMD code, it should be noted that because of portability co
3030

3131
- The SIMD types supported by SIMD.js are Float32x4, Int32x4, Uint32x4, Int16x8, Uint16x8, Int8x16 and Uint8x16. In particular, Float64x2 and Int64x2 are currently not supported, however Float64x2 is emulated in software in the current polyfill. 256-bit or wider SIMD types (AVX) are not supported either.
3232

33-
- Even though the full set of SSE1 and SSE2 intrinsics are supported, because of the platform-abstract nature of SIMD.js, some of these intrinsics will compile down to scalarized instructions to emulate. To verify which instructions are accelerated and which are not, examine the code in the platform headers `xmmintrin.h <https://github.com/kripken/emscripten/blob/incoming/system/include/emscripten/xmmintrin.h>`_ and `emmintrin.h <https://github.com/kripken/emscripten/blob/incoming/system/include/emscripten/xmmintrin.h>`_.
33+
- Even though the full set of SSE1, SSE2 and SSE3 intrinsics are supported, because of the platform-abstract nature of SIMD.js, some of these intrinsics will compile down to scalarized instructions to emulate. To verify which instructions are accelerated and which are not, examine the code in the platform headers `xmmintrin.h <https://github.com/kripken/emscripten/blob/incoming/system/include/emscripten/xmmintrin.h>`_ and `emmintrin.h <https://github.com/kripken/emscripten/blob/incoming/system/include/emscripten/xmmintrin.h>`_.
3434

35-
- Currently the Intel x86 SIMD support is limited to SSE1 and SSE2 instruction sets. The Intel x86 SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and FMA instruction sets or newer are not supported. Also, the old Intel x86 MMX instruction set is not supported.
35+
- Currently the Intel x86 SIMD support is limited to SSE1, SSE2 and SSE3 instruction sets. The Intel x86 SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and FMA instruction sets or newer are not supported. Also, the old Intel x86 MMX instruction set is not supported.
3636

3737
- SIMD.js does not have control over managing floating point rounding modes or handling denormals.
3838

system/include/emscripten/pmmintrin.h

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
/*===---- pmmintrin.h - SSE3 intrinsics ------------------------------------===
2+
*
3+
* Permission is hereby granted, free of charge, to any person obtaining a copy
4+
* of this software and associated documentation files (the "Software"), to deal
5+
* in the Software without restriction, including without limitation the rights
6+
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7+
* copies of the Software, and to permit persons to whom the Software is
8+
* furnished to do so, subject to the following conditions:
9+
*
10+
* The above copyright notice and this permission notice shall be included in
11+
* all copies or substantial portions of the Software.
12+
*
13+
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14+
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15+
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16+
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17+
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18+
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19+
* THE SOFTWARE.
20+
*
21+
*===-----------------------------------------------------------------------===
22+
*/
23+
24+
#ifndef __PMMINTRIN_H
25+
#define __PMMINTRIN_H
26+
27+
#include <emmintrin.h>
28+
29+
#ifndef __SSE3__
30+
#error "SSE3 instruction set not enabled"
31+
#endif
32+
33+
/* Define the default attributes for the functions in this file. */
34+
#ifdef __EMSCRIPTEN__
35+
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__))
36+
#else
37+
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("sse3")))
38+
#endif
39+
40+
static __inline__ __m128i __DEFAULT_FN_ATTRS
41+
_mm_lddqu_si128(__m128i const *__p)
42+
{
43+
#ifdef __EMSCRIPTEN__
44+
return _mm_loadu_si128(__p);
45+
#else
46+
return (__m128i)__builtin_ia32_lddqu((char const *)__p);
47+
#endif
48+
}
49+
50+
static __inline__ __m128 __DEFAULT_FN_ATTRS
51+
_mm_addsub_ps(__m128 __a, __m128 __b)
52+
{
53+
#ifdef __EMSCRIPTEN__
54+
return _mm_add_ps(__a, _mm_mul_ps(__b, _mm_set_ps(1.f, -1.f, 1.f, -1.f)));
55+
#else
56+
return __builtin_ia32_addsubps(__a, __b);
57+
#endif
58+
}
59+
60+
static __inline__ __m128 __DEFAULT_FN_ATTRS
61+
_mm_hadd_ps(__m128 __a, __m128 __b)
62+
{
63+
#ifdef __EMSCRIPTEN__
64+
return _mm_add_ps(_mm_shuffle_ps(__a, __b, _MM_SHUFFLE(2, 0, 2, 0)), _mm_shuffle_ps(__a, __b, _MM_SHUFFLE(3, 1, 3, 1)));
65+
#else
66+
return __builtin_ia32_haddps(__a, __b);
67+
#endif
68+
}
69+
70+
static __inline__ __m128 __DEFAULT_FN_ATTRS
71+
_mm_hsub_ps(__m128 __a, __m128 __b)
72+
{
73+
#ifdef __EMSCRIPTEN__
74+
return _mm_sub_ps(_mm_shuffle_ps(__a, __b, _MM_SHUFFLE(2, 0, 2, 0)), _mm_shuffle_ps(__a, __b, _MM_SHUFFLE(3, 1, 3, 1)));
75+
#else
76+
return __builtin_ia32_hsubps(__a, __b);
77+
#endif
78+
}
79+
80+
static __inline__ __m128 __DEFAULT_FN_ATTRS
81+
_mm_movehdup_ps(__m128 __a)
82+
{
83+
return __builtin_shufflevector(__a, __a, 1, 1, 3, 3);
84+
}
85+
86+
static __inline__ __m128 __DEFAULT_FN_ATTRS
87+
_mm_moveldup_ps(__m128 __a)
88+
{
89+
return __builtin_shufflevector(__a, __a, 0, 0, 2, 2);
90+
}
91+
92+
static __inline__ __m128d __DEFAULT_FN_ATTRS
93+
_mm_addsub_pd(__m128d __a, __m128d __b)
94+
{
95+
#ifdef __EMSCRIPTEN__
96+
return _mm_add_pd(__a, _mm_mul_pd(__b, _mm_set_pd(1.0, -1.0)));
97+
#else
98+
return __builtin_ia32_addsubpd(__a, __b);
99+
#endif
100+
}
101+
102+
static __inline__ __m128d __DEFAULT_FN_ATTRS
103+
_mm_hadd_pd(__m128d __a, __m128d __b)
104+
{
105+
#ifdef __EMSCRIPTEN__
106+
return _mm_add_pd(_mm_shuffle_pd(__a, __b, _MM_SHUFFLE2(0, 0)), _mm_shuffle_pd(__a, __b, _MM_SHUFFLE2(1, 1)));
107+
#else
108+
return __builtin_ia32_haddpd(__a, __b);
109+
#endif
110+
}
111+
112+
static __inline__ __m128d __DEFAULT_FN_ATTRS
113+
_mm_hsub_pd(__m128d __a, __m128d __b)
114+
{
115+
#ifdef __EMSCRIPTEN__
116+
return _mm_sub_pd(_mm_shuffle_pd(__a, __b, _MM_SHUFFLE2(0, 0)), _mm_shuffle_pd(__a, __b, _MM_SHUFFLE2(1, 1)));
117+
#else
118+
return __builtin_ia32_hsubpd(__a, __b);
119+
#endif
120+
}
121+
122+
#define _mm_loaddup_pd(dp) _mm_load1_pd(dp)
123+
124+
static __inline__ __m128d __DEFAULT_FN_ATTRS
125+
_mm_movedup_pd(__m128d __a)
126+
{
127+
return __builtin_shufflevector(__a, __a, 0, 0);
128+
}
129+
130+
#define _MM_DENORMALS_ZERO_ON (0x0040)
131+
#define _MM_DENORMALS_ZERO_OFF (0x0000)
132+
133+
#define _MM_DENORMALS_ZERO_MASK (0x0040)
134+
135+
#define _MM_GET_DENORMALS_ZERO_MODE() (_mm_getcsr() & _MM_DENORMALS_ZERO_MASK)
136+
#define _MM_SET_DENORMALS_ZERO_MODE(x) (_mm_setcsr((_mm_getcsr() & ~_MM_DENORMALS_ZERO_MASK) | (x)))
137+
138+
#ifndef __EMSCRIPTEN__
139+
140+
static __inline__ void __DEFAULT_FN_ATTRS
141+
_mm_monitor(void const *__p, unsigned __extensions, unsigned __hints)
142+
{
143+
__builtin_ia32_monitor((void *)__p, __extensions, __hints);
144+
}
145+
146+
static __inline__ void __DEFAULT_FN_ATTRS
147+
_mm_mwait(unsigned __extensions, unsigned __hints)
148+
{
149+
__builtin_ia32_mwait(__extensions, __hints);
150+
}
151+
152+
#endif /* __EMSCRIPTEN__ */
153+
154+
#undef __DEFAULT_FN_ATTRS
155+
156+
#endif /* __PMMINTRIN_H */

system/include/emscripten/x86intrin.h

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
#ifndef __X86INTRIN_H
2+
#define __X86INTRIN_H
3+
4+
// x86intrin.h is the standard include-all for all supported intrinsics.
5+
6+
#if __SSE__
7+
#include <xmmintrin.h>
8+
#else
9+
#warning x86intrin.h included without SIMD.js support enabled.
10+
#endif
11+
12+
#if __SSE2__
13+
#include <emmintrin.h>
14+
#endif
15+
16+
#if __SSE3__
17+
#include <pmmintrin.h>
18+
#endif
19+
20+
#endif

tests/test_core.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5840,6 +5840,21 @@ def test_sse2_full(self):
58405840
self.emcc_args = orig_args + mode + ['-I' + path_from_root('tests'), '-msse2'] + args
58415841
self.do_run(open(path_from_root('tests', 'test_sse2_full.cpp'), 'r').read(), native_result)
58425842

5843+
# Tests the full SSE3 API.
5844+
@SIMD
5845+
def test_sse3_full(self):
5846+
args = []
5847+
if '-O0' in self.emcc_args: args += ['-D_DEBUG=1']
5848+
Popen([CLANG, path_from_root('tests', 'test_sse3_full.cpp'), '-o', 'test_sse3_full', '-D_CRT_SECURE_NO_WARNINGS=1', '-msse3'] + args + get_clang_native_args(), env=get_clang_native_env(), stdout=PIPE).communicate()
5849+
native_result, err = Popen('./test_sse3_full', stdout=PIPE).communicate()
5850+
native_result = native_result.replace('\r\n', '\n') # Windows line endings fix
5851+
5852+
Settings.PRECISE_F32 = 1 # SIMD currently requires Math.fround
5853+
orig_args = self.emcc_args
5854+
for mode in [[], ['-s', 'SIMD=1']]:
5855+
self.emcc_args = orig_args + mode + ['-I' + path_from_root('tests'), '-msse3'] + args
5856+
self.do_run(open(path_from_root('tests', 'test_sse3_full.cpp'), 'r').read(), native_result)
5857+
58435858
@SIMD
58445859
def test_simd(self):
58455860
test_path = path_from_root('tests', 'core', 'test_simd')

tests/test_sse3_full.cpp

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
// This file uses SSE3 by calling different functions with different interesting inputs and prints the results.
2+
// Use a diff tool to compare the results between platforms.
3+
4+
#include <pmmintrin.h>
5+
#define ENABLE_SSE2
6+
#include "test_sse_full.h"
7+
8+
#ifndef _DEBUG
9+
// The following tests break when optimizer is applied, so disable them for now. Baby steps.
10+
// See https://github.com/kripken/emscripten/issues/3789
11+
#define BREAKS_UNDER_OPTIMIZATION
12+
#endif
13+
14+
float *interesting_floats = get_interesting_floats();
15+
int numInterestingFloats = sizeof(interesting_floats_)/sizeof(interesting_floats_[0]);
16+
uint32_t *interesting_ints = get_interesting_ints();
17+
int numInterestingInts = sizeof(interesting_ints_)/sizeof(interesting_ints_[0]);
18+
double *interesting_doubles = get_interesting_doubles();
19+
int numInterestingDoubles = sizeof(interesting_doubles_)/sizeof(interesting_doubles_[0]);
20+
21+
int main()
22+
{
23+
assert(numInterestingFloats % 4 == 0);
24+
assert(numInterestingInts % 4 == 0);
25+
assert(numInterestingDoubles % 4 == 0);
26+
27+
Ret_M128d_M128d(__m128d, _mm_addsub_pd);
28+
Ret_M128_M128(__m128, _mm_addsub_ps);
29+
Ret_M128d_M128d(__m128d, _mm_hadd_pd);
30+
Ret_M128_M128(__m128, _mm_hadd_ps);
31+
Ret_M128d_M128d(__m128d, _mm_hsub_pd);
32+
Ret_M128_M128(__m128, _mm_hsub_ps);
33+
#ifndef BREAKS_UNDER_OPTIMIZATION
34+
Ret_IntPtr(__m128i, _mm_lddqu_si128, __m128i*, 4, 1);
35+
#endif
36+
Ret_DoublePtr(__m128d, _mm_loaddup_pd, 1, 1);
37+
Ret_M128d(__m128d, _mm_movedup_pd);
38+
Ret_M128(__m128, _mm_movehdup_ps);
39+
Ret_M128(__m128, _mm_moveldup_ps);
40+
}

0 commit comments

Comments
 (0)