Bug: Recent llama.cpp breaks q4_0_4_4 on Arm CPU #10165

FanShupei · 2024-11-04T12:35:56Z

What happened?

Run any Q4_0_4_4 model, now it fails with an assertion error. Any clue for this?

The last good version I know is b3971 (2024 Oct 24). I'll do some bisection later.

Name and Version

$ build/bin/llama-cli --version
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M2)
version: 4026 (05697f67)
built with Apple clang version 16.0.0 (clang-1600.0.26.3) for arm64-apple-darwin23.5.0

The metal backend is disabled explicitly by setting DGGML_METAL=OFF

What operating system are you seeing the problem on?

Mac

Relevant log output

$ llama-bench -m llama32-1b-instruct-q4_0_4_4.gguf

register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M2)
warning: asserts enabled, performance may be affected
| model                          |       size |     params | backend    | threads | fa |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -: | ------------: | -------------------: |
Assertion failed: (!isnan(x)), function ggml_compute_forward_silu_f32, file ggml-cpu.c, line 6649.

The text was updated successfully, but these errors were encountered:

FanShupei · 2024-11-04T13:04:16Z

After bisection, commit 'ggml : move CPU backend to a separate file (#10144)' (b4020) seems to introduce the regression.

ggerganov · 2024-11-04T13:18:34Z

@slaren This should fix it:

diff --git a/ggml/src/ggml-cpu.c b/ggml/src/ggml-cpu.c
index 4b8ffb62..a9652f9e 100644
--- a/ggml/src/ggml-cpu.c
+++ b/ggml/src/ggml-cpu.c
@@ -304,6 +304,7 @@ static const struct ggml_type_traits_cpu type_traits_cpu[GGML_TYPE_COUNT] = {
         .nrows                    = 1,
     },
     [GGML_TYPE_Q8_0] = {
+        .from_float_to_mat        = quantize_mat_q8_0,
         .vec_dot                  = ggml_vec_dot_q8_0_q8_0,
         .vec_dot_type             = GGML_TYPE_Q8_0,
 #if defined (__ARM_FEATURE_MATMUL_INT8)

slaren · 2024-11-04T13:24:40Z

Thanks, I suspected it was some issue with the type_traits, but I couldn't find it.

FanShupei added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Nov 4, 2024

FanShupei changed the title ~~Bug: Recent llama.cpp breaks q4_0_4_4 on CPU~~ Bug: Recent llama.cpp breaks q4_0_4_4 on Arm CPU Nov 4, 2024

slaren added bug Something isn't working and removed bug-unconfirmed labels Nov 4, 2024

slaren self-assigned this Nov 4, 2024

slaren mentioned this issue Nov 4, 2024

ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment #10167

Merged

chaxu01 mentioned this issue Nov 4, 2024

ggml : move CPU backend to a separate file #10144

Merged

slaren closed this as completed in #10167 Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Recent llama.cpp breaks q4_0_4_4 on Arm CPU #10165

Bug: Recent llama.cpp breaks q4_0_4_4 on Arm CPU #10165

FanShupei commented Nov 4, 2024

FanShupei commented Nov 4, 2024

ggerganov commented Nov 4, 2024

slaren commented Nov 4, 2024

Bug: Recent llama.cpp breaks q4_0_4_4 on Arm CPU #10165

Bug: Recent llama.cpp breaks q4_0_4_4 on Arm CPU #10165

Comments

FanShupei commented Nov 4, 2024

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

FanShupei commented Nov 4, 2024

ggerganov commented Nov 4, 2024

slaren commented Nov 4, 2024