-
Notifications
You must be signed in to change notification settings - Fork 12.2k
ARM: Fixes and additions to CPU feature detection #14049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -3449,6 +3469,14 @@ int ggml_cpu_has_dotprod(void) { | |||
#endif | |||
} | |||
|
|||
int ggml_cpu_has_fp16_va(void) { | |||
#if defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we do a second ifdef here when this variable is set to 0 or 1 elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below, case 1. The way I see it, during runtime detection the host may report that the CPU supports this feature, but if we disabled it at compilation, we want the function to always return 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds support for ARM SVE2 feature detection and refines FP16 vector arithmetic checks in the ggml CPU backend.
- Introduce
has_sve2
flag and exposeggml_cpu_has_sve2()
in the public API - Enhance runtime (sysctl) and compile-time detection for
FP16_VECTOR_ARITHMETIC
- Insert SVE2 into the backend feature list
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
ggml/src/ggml-cpu/ggml-cpu.cpp | Pushes “SVE2” into the feature vector |
ggml/src/ggml-cpu/ggml-cpu.c | Adds has_fp16_va and has_sve2 fields; updates runtime/sysctl & compile-time checks |
ggml/include/ggml-cpu.h | Declares ggml_cpu_has_sve2() |
Comments suppressed due to low confidence (3)
ggml/src/ggml-cpu/ggml-cpu.cpp:562
- You’ve added SVE2 detection here but no tests were introduced to verify this flag under different CPU configurations. Consider adding unit tests or CI checks to cover both presence and absence of SVE2.
if (ggml_cpu_has_sve2()) {
ggml/src/ggml-cpu/ggml-cpu.cpp:559
- The PR description mentions improved FP16_VECTOR_ARITHMETIC detection, but this feature isn't added to the
features
vector. Add a correspondingfeatures.push_back({ "FP16_VECTOR_ARITHMETIC", "1" });
entry whenggml_cpu_has_fp16_va()
returns true.
static ggml_backend_feature * ggml_backend_cpu_get_features(ggml_backend_reg_t r
ggml/src/ggml-cpu/ggml-cpu.cpp:562
- [nitpick] Indentation here doesn't match the surrounding
if
statements (extra spaces). Align this block to the existing code style for consistency.
if (ggml_cpu_has_sve2()) {
@@ -689,8 +691,10 @@ static void ggml_init_arm_arch_features(void) { | |||
|
|||
ggml_arm_arch_features.has_neon = !!(hwcap & HWCAP_ASIMD); | |||
ggml_arm_arch_features.has_dotprod = !!(hwcap & HWCAP_ASIMDDP); | |||
ggml_arm_arch_features.has_fp16_va = !!(hwcap & HWCAP_FPHP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro HWCAP_FPHP
looks like a typo; the standard HWCAP for FP16 support is usually HWCAP_FP16
. Verify and correct this macro to ensure proper runtime detection.
ggml_arm_arch_features.has_fp16_va = !!(hwcap & HWCAP_FPHP); | |
ggml_arm_arch_features.has_fp16_va = !!(hwcap & HWCAP_FP16); |
Copilot uses AI. Check for mistakes.
ggml_arm_arch_features.has_dotprod = 0; | ||
#endif | ||
|
||
#if defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compile-time block for has_fp16_va
may override the sysctl result unconditionally (even on Apple). Wrap these macros in an #else
of the Apple-specific sysctl branch to avoid conflicting assignments.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reason to check for a feature that is not supported in the code. The current runtime detection in ggml_arm_arch_features
is not working and should be removed, or adapted into a system similar to what we use for x86-64.
I have another change queued up that implements GGML_CPU_ALL_VARIANTS for ARM, which would be based on this.
Can you expand on that? It works fine on my end, though I've only done basic smoke testing so far. |
The problem is that using intrinsics requires enabling support for the instruction set in the compiler, and this may cause the compiler to emit these instructions even in code that doesn't use intrinsics, e.g. for auto-vectorization. For this reason we cannot rely on this type of runtime dispatching when using intrinsics. |
Ah, I think with "using" you mean general in the general sense, eg: someone decides to branch on I would then alter this PR to implement something like
Isn't this happening right now anyway? During compilation, int ggml_cpu_has_dotprod(void) {
#if defined(__ARM_ARCH) && defined(__ARM_FEATURE_DOTPROD)
return ggml_arm_arch_features.has_dotprod;
#else
return 0;
#endif
} We have three cases:
And 1+2 are OK but in 3. there is the problem you mention, that "his may cause the compiler to emit these instructions even in code that doesn't use intrinsics", right? If so, then it seems that this could only be a problem when a binary was compiled NATIVE for some host, and transferred to a "lesser" host, which wouldn't make practical sense (that's not native). And right now we only have 1-N backends built specifically for some CPU, there is no single "universal binary". In the context of A bit much but again, just to understand what the intention is. I'll be needing to do the same for PowerPC so I'd like to get everything right. |
Yes exactly, you got everything right. Just to reiterate: some code relies on the feature detection to determine which instruction set to use, for example: llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Line 2197 in 5787b5d
While this works for inline assembly, it doesn't work for intrinsics and thus the method is flawed, and should be removed. The best way to implement runtime dispatching for Arm would be to implement support for GGML_CPU_ALL_VARIANTS by adding a file similar to cpu-feats-x86.cpp that computes the score for Arm, and defining a list of variants to build. The current feature detection code could be used as a starting point to implement the score function.
|
Excellent, thank you for the clarification. You've anticipated my next question: is it OK to start a I see how the problem in |
|
I'm afraid I already finished this yesterday, just filed it as #14080. |
As the proposed alternative implementation does not necessitate fixes to the existing ARM feature detection, I consider this PR obsolete, superseded by #14080. Therefore retracting. |
hope your PR can be approved although I can see there is another potential implementation from a regular employee from ARM. |
I think the conclusion of the above discussion is that implementing GGML_CPU_ALL_VARIANTS doesn't really touch ARM, it's just a question of how the build is produced. This is what #14080 does. Its scoring function only makes syscalls to query support from the OS, the rest is just cmake changes. The parts that actually touch ARM still need their work (see comment). |
I see. the ARM and other SoC vendors has many undocumented tech docs/libs and they know everything about ARM-based SoC and dedicated chips. BTW, FYI:https://github.com/nihui/ruapu |
Working with the ggml-cpu ARM backend, I noticed that feature detection was not entirely complete.
This improves detection for
FP16_VECTOR_ARITHMETIC
, and adds support forSVE2
.Note that I had no way to test the
__APPLE__
implementation for queryingFP16_VECTOR_ARITHMETIC
, I just used the sysctl name that I found in a web search.