-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Quantization produces invalid decoder.token_embedding.weight in resulting file #2906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am suspecting that the super-block size is a likely cause. Look at this log line
See that the first dimension length is I believe this is the motivation why llama.cpp introduced I think the simplest work-around at the moment is to use other models than $ quantize models/ggml-base.en.bin models/ggml-base.q3_k.bin 11
$ whisper-cli -m models/ggml-base.q3_k.bin -f jfk.wav
...
[00:00:00.000 --> 00:00:08.000] And so, my fellow Americans, ask not what your country can do for you.
[00:00:08.000 --> 00:00:11.000] Ask what you can do for your country.
... |
_K quantization is not working with CUDA in b6f3fa4 It generates a different error though. None of the _K types are supported. But _0 and _1 types work. _I dunno... If K types aren't supposed to work with CUDA, maybe mention that in the docs.
|
I think https://github.com/ggml-org/whisper.cpp/blob/master/ggml/src/ggml-cuda/getrows.cu#L160-L198 to see which quant type is supported. The fact is that not all the GGML backends have reached the feature parity, |
The correct solution is to keep the embeddings tensors (the ones from which we get rows) in pinned host memory, similar to how we do it in |
Uh oh!
There was an error while loading. Please reload this page.
After quantizing with q3_k the resulting model is unusable. Quantize runs without errors, it appears to describe some wrong metadata on the model.(?)
whisper_model_load: tensor 'decoder.token_embedding.weight' has wrong size in model file: got 5705095, expected 2190735360
Tested on latest master e27fd6f
Relevant logs below.
edit: fixed formating :3
The text was updated successfully, but these errors were encountered: