-
Notifications
You must be signed in to change notification settings - Fork 12k
Fix usage of F16C intrinsics in AVX code #563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that
#if defined(F16C)
// the _mm256_cvt intrinsics require F16C
#define GGML_F32Cx8_LOAD(x) _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
#define GGML_F32Cx8_STORE(x, y) _mm_storeu_si128((__m128i *)(x), _mm256_cvtps_ph(y, 0))
#else
static inline __m256 __sse_f16x8_load(ggml_fp16_t *x) {
float tmp[8];
for (int i = 0; i < 8; i++)
tmp[i] = GGML_FP16_TO_FP32(x[i]);
return _mm256_loadu_ps(tmp);
}
static inline void __sse_f16x8_store(ggml_fp16_t *x, __m256 y) {
float arr[8];
_mm256_storeu_ps(arr, y);
for (int i = 0; i < 8; i++)
x[i] = GGML_FP16_TO_FP32(arr[i]);
}
#define GGML_F32Cx8_LOAD(x) __sse_f16x8_load(x)
#define GGML_F32Cx8_STORE(x, y) __sse_f16x8_store(x, y)
#endif
compiles on the i5-2400 but was not able to test the program itself due to other issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be resolved before merging. I couldn't edit the thing above so the post above is what I am referring to.
There is one more problem regarding to this though, is that Makefile is configured to check for the processor features but CMake always declares it no matter what: I'm not necessarily saying that CMake should be changed to check for the features but it should print information of the enabled features or a disclaimer of some sort to reduce the amount of trouble people are having. Probably adding a option to CMakeLists for F16C in non-Windows-MSVC builds like there already are for FMA/AVX/etc. would be the right choice. Any way, that's a problem for another day. Good that this one is fixed now. We can simply instruct people to use edit: #576 should fix this. |
Sandy bridge supports AVX but not F16C. Should fix #562
@kaufmannr can you verify if this fixes your problem?