Skip to content

Fix usage of F16C intrinsics in AVX code #563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 28, 2023
Merged

Conversation

slaren
Copy link
Member

@slaren slaren commented Mar 27, 2023

Sandy bridge supports AVX but not F16C. Should fix #562

@kaufmannr can you verify if this fixes your problem?

@slaren slaren mentioned this pull request Mar 27, 2023
Copy link

@kaufmannr kaufmannr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that

#if defined(F16C)
// the _mm256_cvt intrinsics require F16C
#define GGML_F32Cx8_LOAD(x) _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
#define GGML_F32Cx8_STORE(x, y) _mm_storeu_si128((__m128i *)(x), _mm256_cvtps_ph(y, 0))
#else
static inline __m256 __sse_f16x8_load(ggml_fp16_t *x) {
float tmp[8];

for (int i = 0; i < 8; i++)
    tmp[i] = GGML_FP16_TO_FP32(x[i]);

return _mm256_loadu_ps(tmp);

}

static inline void __sse_f16x8_store(ggml_fp16_t *x, __m256 y) {
float arr[8];

_mm256_storeu_ps(arr, y);

for (int i = 0; i < 8; i++)
    x[i] = GGML_FP16_TO_FP32(arr[i]);

}
#define GGML_F32Cx8_LOAD(x) __sse_f16x8_load(x)
#define GGML_F32Cx8_STORE(x, y) __sse_f16x8_store(x, y)
#endif

compiles on the i5-2400 but was not able to test the program itself due to other issues.

Copy link
Contributor

@anzz1 anzz1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be resolved before merging. I couldn't edit the thing above so the post above is what I am referring to.

@anzz1
Copy link
Contributor

anzz1 commented Mar 28, 2023

There is one more problem regarding to this though, is that Makefile is configured to check for the processor features but CMake always declares it no matter what:
https://github.com/ggerganov/llama.cpp/blob/7e5395575a3360598f2565c73c8a2ec0c0abbdb8/CMakeLists.txt#L200

I'm not necessarily saying that CMake should be changed to check for the features but it should print information of the enabled features or a disclaimer of some sort to reduce the amount of trouble people are having.

Probably adding a option to CMakeLists for F16C in non-Windows-MSVC builds like there already are for FMA/AVX/etc. would be the right choice.

Any way, that's a problem for another day. Good that this one is fixed now. We can simply instruct people to use make as a primary option for now.

edit: #576 should fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants