Skip to content

Bug: Recent llama.cpp breaks q4_0_4_4 on Arm CPU #10165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
FanShupei opened this issue Nov 4, 2024 · 3 comments · Fixed by #10167
Closed

Bug: Recent llama.cpp breaks q4_0_4_4 on Arm CPU #10165

FanShupei opened this issue Nov 4, 2024 · 3 comments · Fixed by #10167
Assignees
Labels
bug Something isn't working high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)

Comments

@FanShupei
Copy link
Contributor

What happened?

Run any Q4_0_4_4 model, now it fails with an assertion error. Any clue for this?

The last good version I know is b3971 (2024 Oct 24). I'll do some bisection later.

Name and Version

$ build/bin/llama-cli --version
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M2)
version: 4026 (05697f67)
built with Apple clang version 16.0.0 (clang-1600.0.26.3) for arm64-apple-darwin23.5.0

The metal backend is disabled explicitly by setting DGGML_METAL=OFF

What operating system are you seeing the problem on?

Mac

Relevant log output

$ llama-bench -m llama32-1b-instruct-q4_0_4_4.gguf

register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M2)
warning: asserts enabled, performance may be affected
| model                          |       size |     params | backend    | threads | fa |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -: | ------------: | -------------------: |
Assertion failed: (!isnan(x)), function ggml_compute_forward_silu_f32, file ggml-cpu.c, line 6649.
@FanShupei FanShupei added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Nov 4, 2024
@FanShupei FanShupei changed the title Bug: Recent llama.cpp breaks q4_0_4_4 on CPU Bug: Recent llama.cpp breaks q4_0_4_4 on Arm CPU Nov 4, 2024
@slaren slaren added bug Something isn't working and removed bug-unconfirmed labels Nov 4, 2024
@slaren slaren self-assigned this Nov 4, 2024
@FanShupei
Copy link
Contributor Author

After bisection, commit 'ggml : move CPU backend to a separate file (#10144)' (b4020) seems to introduce the regression.

@ggerganov
Copy link
Member

@slaren This should fix it:

diff --git a/ggml/src/ggml-cpu.c b/ggml/src/ggml-cpu.c
index 4b8ffb62..a9652f9e 100644
--- a/ggml/src/ggml-cpu.c
+++ b/ggml/src/ggml-cpu.c
@@ -304,6 +304,7 @@ static const struct ggml_type_traits_cpu type_traits_cpu[GGML_TYPE_COUNT] = {
         .nrows                    = 1,
     },
     [GGML_TYPE_Q8_0] = {
+        .from_float_to_mat        = quantize_mat_q8_0,
         .vec_dot                  = ggml_vec_dot_q8_0_q8_0,
         .vec_dot_type             = GGML_TYPE_Q8_0,
 #if defined (__ARM_FEATURE_MATMUL_INT8)

@slaren
Copy link
Member

slaren commented Nov 4, 2024

Thanks, I suspected it was some issue with the type_traits, but I couldn't find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants