-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Old GGUF have broken tokenization and there is no warning #7476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Don't you get this warning when using an old, broken model?
|
@slaren I don't, no. Here is the output of perplexity.cpp from my colab. Perhaps the issue is as simple as adding the warning to perplexity.cpp too, if it's present elsewhere.
|
The warning is only generated for models with a BPE tokenizer, but that model has a SPM tokenizer. I am not sure if that's because the model was converted with the wrong tokenizer type or anything else, but I don't see an easy way to detect that. |
@turian Try to find the exact commit that changes the results for your model |
It appears recent changes broke the tokenizer. Tokens ids are now **out of the vocabulary range** causing a crash here: ```rust *id_seen.get_mut(data[0].id as usize).unwrap() = true; ``` in `Candidates::from_vec` It may also be possible that `llama_get_logits_ith` changed somehow and is returning garbage. Either is **terrible** very bad no good news. If we didn't check for this we would be indexing into garbage data. Until this is fixed we're reverting to a previous version that also doesn't require us to rebuild or convert our models yet another time :/ Related: ggml-org/llama.cpp#7476
The candidate crash was not in fact caused by bad data from the kernel or ids out of range, rather my fault. So nothing to do with this issue here: ggml-org/llama.cpp#7476 The `Candidates` container can shrink, but it's only used in locally typical sampling, and I forgot it did that which is why ids are out of range in that case. I could use a HashMap to solve i or an associative array, but it's easier just to remove the duplicate check from `Candidates::from_vec`. I missed it because I checked all uses of `from_vec`, not the actual call stack, and this use in locally typical calls `from_iter` which calls `from_vec` with truncated `Candidates`, causing the crash. ```rust indices .iter() .take(new_len) .map(|&index| new.data[index]) .collect() ``` The model will now have to be updated, but whatever.
I think I found the cause for this. Prior to that release, using this model to tokenize The issue appears to be that text after special tokens begin with an added space. |
@ggerganov I've just tested it again with the latest release ( |
@ggerganov the issue appear to be longstanding and perhaps implicit in perplexity.cpp itself 3358c38 from september of last year still has bad perplexity values. ee77efe from august of last year simply cannot load the model. |
Not sure what could be the problem. We do our best to keep backwards compat or at least print warnings when there are breaking changes, but it's possible we overlook some cases. Therefore the only recommended way to use |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
As reported in #6944 (comment)
The llama.cpp tokenizers give different results than HF for old GGUF files.
This is a subtle footgun and at least there should be a warning, since it is impossible now to determine what at what vintage your old GGUF models suddenly spoil.
Right now, the only reliable way to determine this is by running perplexity.cpp and comparing it to HF. The key numbers for the first 512-toks of wiki-2-test are as follows.
Using llama.cpp tokenizer:
Using HF tokenizer and passing those tokens into different model implementations:
This is demonstrated through an attached notebook, which you can play with at this colab. I'll paste the code below too.
model. quant perplexity
llama.cpp Q8_0 perplexity: 15.4660
Llama-CPP-python Q8_0 perplexity: 15.392684936523438
llama.cpp Q5_K_M perplexity: 15.6994
Llama-CPP-python Q5_K_M perplexity: 15.637877464294434
model quant perplexity
Huggingface perplexity: 6.205880641937256
Llama-CPP-python Q8_0 perplexity: 6.204566478729248
Llama-CPP-python Q5_K_M perplexity: 6.228440761566162
below:
llama.cpp Q5_K_M perplexity: 15.6994
llama.cpp Q5_K_M perplexity: 15.4660
above:
Llama-CPP tinyllama-1.1b-chat-v1.0.Q5_K_M.gguf perplexity: 15.637877464294434
Llama-CPP tinyllama-1.1b-chat-v1.0.Q8_0.gguf perplexity: 15.392684936523438
The text was updated successfully, but these errors were encountered: