Fix quantize_row_q4_1() with ARM_NEON #876

ggerganov · 2023-04-10T14:40:14Z

It is currently bugged. See results of quantize-stats on M1:

$  ./quantize-stats -m models/7B/ggml-model-f16.bin 
Loading model
llama.cpp: loading model from models/7B/ggml-model-f16.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 256
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: f16        = 1
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 14645.07 MB (+ 2052.00 MB per state)
llama_init_from_file: kv self size  =  256.00 MB
note: source model is f16
testing 291 layers with max size 131072000
q4_0                                              : rmse 0.00222150, maxerr 0.18429124, 95pct<0.0040, median<0.0018
q4_1                                              : rmse 0.00360044, maxerr 0.26373291, 95pct<0.0066, median<0.0028

main:    total time = 93546.68 ms

The RMSE is too high - worse than Q4_0.

There is a bug in the following piece of code:

https://github.com/ggerganov/llama.cpp/blob/180b693a47b6b825288ef9f2c39d24b6eea4eea6/ggml.c#L922-L955

We should fix it

The text was updated successfully, but these errors were encountered:

ggerganov added bug Something isn't working high priority Very important issue labels Apr 10, 2023

ggerganov closed this as completed in 684da25 Apr 10, 2023

This was referenced Apr 10, 2023

llama.cpp acts too dumb while running on phone!! #802

Closed

Unit test for quantization functions #953

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix quantize_row_q4_1() with ARM_NEON #876

Fix quantize_row_q4_1() with ARM_NEON #876

ggerganov commented Apr 10, 2023

Fix quantize_row_q4_1() with ARM_NEON #876

Fix quantize_row_q4_1() with ARM_NEON #876

Comments

ggerganov commented Apr 10, 2023