-
Notifications
You must be signed in to change notification settings - Fork 12k
Bug: Persistent hallucination even after re-running llama.cpp #8070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You're not using the correct chat template, also Related to #8068 |
@ngxson How do you run the server? I run it like this ../llama.cpp/llama-server -m ../../models/Meta-Llama-3-8B.Q5_K_S.gguf --n-gpu-layers 25 -c 0 and the output makes no sense, see video below: |
@dspasyuk You have to use an "Instruct" model to be able to chat |
You are correct @ggerganov I somehow mixed up an instruct model with a regular so this works for me: ../llama.cpp/llama-server -m ../../models/Meta-llama-3-8b-instruct_q5_k_s.gguf --n-gpu-layers 25 -c 0 @Edw590 and this works in llama-cli, not sure why but it does #8053 |
@ngxson, please forgive my ignorance, but I can’t find any clear documentation on these chat template changes. Since PR #8068 was committed is main now forcing the application of a chat template—the default one in the model’s metadata—if a specific custom chat template isn’t specified? Ever since I installed a new version of llama.cpp I’m getting terrible quality outputs compared to the old versions I run and I can’t figure out if it’s this change, the change to CUDA kernel processing, or something else entirely. I regularly use models fine-tuned to accept a variety of templates, the default is often therefore not the best and I’m used to manually constructing the prompt, usually via a file. I checked the logs and couldn’t see any evidence of a chat template being auto applied but I’m concerned I’m missing something. |
@ngxson I had the same issues, I was trying different settings templates, server, cli etc, the keywords used to apply templates in llama.cli do not seem to work as expected in my hands. I found the manuals at the end of the Readme page. Llama.cli requires a bos token for every single message you send to the model, which is not how it is described on Meta-llama3 Took some time to figure it out but it is finally working with the new version: Try this code, config for the llama.cli is in config.js file, the UI will print all logs once you run it: https://github.com/dspasyuk/llama.cui |
@MB7979 Just to confirm, are you trying to use a custom chat template? If you want to switch to another built-in chat template, use For example, |
FYI, I introduce a small patch in #8203, that will disable chat template whenever |
@ngxson Thanks for your reply. No, I don’t want to use a non-manual chat template at all, or in-suffix and in-prefix. Just trying to confirm if this affecting the use of main when you want no chat template at all. My default way of using llama.cpp is to feed a long prompt via a file input, with whatever chat template I prefer manually entered in that. The equivalent of -p “[INST] Write a story about a fluffy llama. [/INST] “ entered directly in the command line. |
@MB7979 chat template is only activated with |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Uh oh!
There was an error while loading. Please reload this page.
What happened?
I used the command below:
The output was:
Another time the output was:
The first time I saw it start to hallucinate was with this output:
Then:
Or:
There's a few more before the first 2 I mentioned.
When I tried with another model (Meta-Llama-3-8B-Instruct-Q6_K.gguf), it worked normally again - and so did the original model. Doesn't happen anymore. But this isn't the first time. I don't know if rebooting the system fixes it too or not. Apparently switching models does it, for some reason.
I don't know how to reproduce this. And I don't know where the problem comes from. Also I hope I created the issue with the right severity. Sorry if I didn't get it right.
Name and Version
version: 3203 (b5a5f34)
built with cc (Debian 12.2.0-14) 12.2.0 for aarch64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
No response
The text was updated successfully, but these errors were encountered: