Skip to content

Fix quantize_row_q4_1() with ARM_NEON #876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ggerganov opened this issue Apr 10, 2023 · 0 comments
Closed

Fix quantize_row_q4_1() with ARM_NEON #876

ggerganov opened this issue Apr 10, 2023 · 0 comments
Labels
bug Something isn't working high priority Very important issue

Comments

@ggerganov
Copy link
Member

It is currently bugged. See results of quantize-stats on M1:

$  ./quantize-stats -m models/7B/ggml-model-f16.bin 
Loading model
llama.cpp: loading model from models/7B/ggml-model-f16.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 256
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: f16        = 1
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 14645.07 MB (+ 2052.00 MB per state)
llama_init_from_file: kv self size  =  256.00 MB
note: source model is f16
testing 291 layers with max size 131072000
q4_0                                              : rmse 0.00222150, maxerr 0.18429124, 95pct<0.0040, median<0.0018
q4_1                                              : rmse 0.00360044, maxerr 0.26373291, 95pct<0.0066, median<0.0028

main:    total time = 93546.68 ms

The RMSE is too high - worse than Q4_0.

There is a bug in the following piece of code:

https://github.com/ggerganov/llama.cpp/blob/180b693a47b6b825288ef9f2c39d24b6eea4eea6/ggml.c#L922-L955

We should fix it

@ggerganov ggerganov added bug Something isn't working high priority Very important issue labels Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high priority Very important issue
Projects
None yet
Development

No branches or pull requests

1 participant