ggml : fix I8MM Q4_1 scaling factor conversion #10562

ggerganov · 2024-11-28T11:13:47Z

These changes fix illegal instruction crash on M1 Pro which does not do a runtime check for the availability of I8MM. We now check ggml_cpu_has_matmul_int8() and if it is false, we unpack the 2x2 multiplication into 4 dot products.

This fix aside, I am wondering if we should drop the int nrc support in the ggml_vec_dot kernels to keep it simple and proceed to implement proper GEMMs similar to the work in ggml-cpu-aarch64.c?

ggerganov · 2024-11-28T11:46:01Z

ggml/src/ggml-cpu/ggml-cpu-quants.c

+                float32_t _scale[4] = {
+                    GGML_FP16_TO_FP32(b_x0->d)*GGML_FP16_TO_FP32(b_y0->d),
+                    GGML_FP16_TO_FP32(b_x0->d)*GGML_FP16_TO_FP32(b_y1->d),
+                    GGML_FP16_TO_FP32(b_x1->d)*GGML_FP16_TO_FP32(b_y0->d),
+                    GGML_FP16_TO_FP32(b_x1->d)*GGML_FP16_TO_FP32(b_y1->d)};


This fixes a bug where the y->d was not converted to F32, resulting in completely wrong numbers when going through this CPU branch.

slaren · 2024-11-28T12:26:34Z

ggml/src/ggml-cpu/ggml-cpu-quants.c

@@ -1759,66 +1759,76 @@ void ggml_vec_dot_q4_0_q8_0(int n, float * restrict s, size_t bs, const void * r
        const block_q8_0 * restrict vy0 = vy;
        const block_q8_0 * restrict vy1 = (const block_q8_0 *) ((const uint8_t*)vy + by);

-        float32x4_t sumv0 = vdupq_n_f32(0.0f);
+        if (ggml_cpu_has_matmul_int8()) {


We need to remove the ARM runtime feature detection completely, it doesn't work at all and never will. So I would prefer if at least we don't make that task worse by adding more checks like this.

Ok, I will change the PR to just include the F16 -> F32 fix in the Q4_1 kernel.

ggml-ci

github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels Nov 28, 2024

ggerganov changed the title ~~ggml : fix row condition for i8mm kernels~~ gml : fix I8MM runtime feature checks in CPU kernels Nov 28, 2024

ggerganov commented Nov 28, 2024

View reviewed changes

slaren reviewed Nov 28, 2024

View reviewed changes

Base automatically changed from gg/cpu-q4_0-i8mm-fix to master November 28, 2024 12:56

ggerganov force-pushed the gg/cpu-i8mm-fix-2 branch from 46a4ed0 to 1a6a669 Compare November 28, 2024 13:02

ggml : fix bug in Q4_1 x Q8_1 I8MM kernel

5acff8f

ggml-ci

ggerganov force-pushed the gg/cpu-i8mm-fix-2 branch from 1a6a669 to 5acff8f Compare November 28, 2024 13:03

ggerganov changed the title ~~gml : fix I8MM runtime feature checks in CPU kernels~~ ggml : fix I8MM Q4_1 scaling factor conversion Nov 28, 2024

slaren approved these changes Nov 29, 2024

View reviewed changes

ggerganov merged commit f0678c5 into master Nov 29, 2024
57 checks passed

ggerganov deleted the gg/cpu-i8mm-fix-2 branch November 29, 2024 14:25

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

ggml : fix I8MM Q4_1 scaling factor conversion (ggml-org#10562)

f819bab

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : fix I8MM Q4_1 scaling factor conversion #10562

ggml : fix I8MM Q4_1 scaling factor conversion #10562

Uh oh!

ggerganov commented Nov 28, 2024 •

edited

Loading

Uh oh!

ggerganov Nov 28, 2024 •

edited

Loading

Uh oh!

slaren Nov 28, 2024

Uh oh!

ggerganov Nov 28, 2024

Uh oh!

Uh oh!

Uh oh!

ggml : fix I8MM Q4_1 scaling factor conversion #10562

ggml : fix I8MM Q4_1 scaling factor conversion #10562

Uh oh!

Conversation

ggerganov commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slaren Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

ggerganov Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Nov 28, 2024 •

edited

Loading

ggerganov Nov 28, 2024 •

edited

Loading