-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Gibberish from longer context #7056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Are you getting a big scary warning about degraded outputs on console? |
No, nothing like that. Actually this is really confusing, because I can't seem to get kobold lite to generate coherent text even with "Hi" as context, now using llama-3, but can with mistral (and can with low context in ST) Perhaps the Llama-3 stuff is because I need updated GGUFs. But doesn't explain why Mistral or Solar is suddenly breaking at higher context sizes - context sizes it used to work fine with. At least I'll try a newer quant of llama-3 to see if that helps. Even so the gibberish from other models like my solar fine tune at contexts that used to work is baffling. |
Okay, can confirm that newer GGUFs work for llama-3 for me. But oddly my old Solar model finetune GGUF that worked perfectly before does not. Has this new release somehow changed the tokenized even for that? Do I need to re-do the GGUF files for my 11B solar fine tunes? |
You didn't say how old the version you used was and even then I probably won't have a perfect overview of all changes that happened. But I would suspect that things will work correctly if you regenerate the GGUF files from the original weights. |
I mean I wouldn't know. Somewhere between Mistral and now. I was talking to some other people on Discord and they seemed to be saying that all old GGUF files regardless of model are now broken, so that'll be what I try next, regenerating the models. Wild if true, just think of how much space is now wasted/broken on HF! Anyway, I'll try to get back in a day or two on this, but you can close it if you want. |
hi !
|
You can get gibberish pretty easily if you go over the default context limit (n_ctx = 512). |
I can confirm that if you requant the gguf from scratch it fixes this issue with older models too (like my solar finetune). So it does appear to be the case that the problem is not a bug per se but rather 'the newest version of llamacpp makes all historical GGUFs broken' |
Yes ok i have update the model. This model don't give any erros if run on cuda even if with ctx 16512, but whith the vulkan backend yes it is better after the model upgraded and limit the ctx but again giberrish after more question ask. |
Someone on reddit asked me on a post about Mac speeds after flash attention whether I was experiencing any odd gibberish issues, so I decided to test it out. Here are my results using KoboldCpp 1.64 on my M2 Ultra Mac Studio (my setup makes it challenging to run this same test directly against llamacpp on the mac): Mixtral 8x7b Instruct v0.1 sending 15k context:
OpenHermes-2.5 Mistral 7b sending same 15k:
Nous Capybara 34b sending same 15k:
Midnight Miqu 70b v1.5 sending same 15k:
NOTE: Again, these are from Koboldcpp, so it could be that the above issue is actually a Kobold issue. But I wanted to add info in case it might help. Additional Note: None of the models in question gave me the reduced quality warning on load. Im familiar with the error as I got it on older quants of Llama 3 and Command-R, but these were not giving it. One more note: I meant to mention that I did try Mixtral 8x7b Instruct v0.1 with flash attention at a lower context (around 6k) and it was just fine,. |
This is related to #7049, I believe I think it's a poor solution to say: Requant old stuff. The correct solution would be that llama.cpp exits with an error (which MAYBE you can override) if it detects that the GGUF tokenization no longer will match the HF AutoTokenizer tokenization. |
There are at least 4 different issues discussed here - just reopen with specific |
yes the same was happening to me in lmstudio which also uses llama_cpp... : i was very sceptical where the problem was as i did not have it on transformers! << so it was here !! >> now you mention the long context (i even dropped this from my models ---- << it was this problem of the yarn embeddings ! >>> Rope >>> not working correctly 1 << hopefully LM-studio will comeback working ! |
exactly this (i deleted many model thinking i had trained them wrong) ... so now i test in colab first before converting ! (it may even be a problem with conversion?? ) (Q4_K_M/S).... (Q8) -NO PROBS |
I just come to update and rebuild to last git and now all test i have done work with vulkan backend with any models even if says degraded output. Thanks to all. |
I first encountered this problem after upgrading to the latest llamaccp in silly tavern. It would generate gibberish no matter what model or settings I used, including models that used to work (like mistral based models). It was confusing because the models generate normally in kobold lite. I thought it was a SillyTavern problem.
Then I pasted a purely coherent text long chat into the prompt in kobold lite (over 6k tokens), and it gave gibberish like "(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr(rr" or meaningless strings of characters and words just like the bug in SillyTavern, and likewise if I deleted the character card, authors note, and strong string from SillyTavern it would generate coherently and normally.
So this appears to be llamacpp and something to do with a fuller or longer context.
I'm using vulkan with fairly standard settings on windows 11. Doesn't matter what kind of base model I use, or any other settings. Basically everything that previously worked before I updated, no longer works (unless I trim the context down to nothing, but that makes it pretty useless). I use a max context size of 6144 if that matters, so it's never larger than that.
The text was updated successfully, but these errors were encountered: