Skip to content

Bug: Persistent hallucination even after re-running llama.cpp #8070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Edw590 opened this issue Jun 22, 2024 · 11 comments
Closed

Bug: Persistent hallucination even after re-running llama.cpp #8070

Edw590 opened this issue Jun 22, 2024 · 11 comments
Labels
bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) stale

Comments

@Edw590
Copy link

Edw590 commented Jun 22, 2024

What happened?

I used the command below:

sudo ./llama-cli -m /home/edw590/llamacpp_models/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf --in-suffix [3234_START] --color --interactive-first --ctx-size 0 --temp 0.2 --mlock --prompt "You are VISOR, my male personal virtual assistant. I'm Edward. I was born in 1999-11-22. It's currently the year of 2024. Address me as Sir or nothing at all. From now on, always end your answers with \"[3234_END]\"."

The output was:

[3234_START]entienda, Sir.entienda
entienda, Sir.entientienda
entienda, Sir.entienda
entienda, Sir.entienda
entienda, Sir.entienda
entienda, Sir.entienda
...

Another time the output was:

[3234_START] Cab, Sir.enti
enti
enti
enti
enti
enti
enti
...

The first time I saw it start to hallucinate was with this output:

[3234_START]Hello Sir! I'm your personal virtual assistant, VISOR. Direct your commands to me, and I will be your Caboose. I am your virtual Caboose. I is your Caboose. I am your Caboose. I am your Caboose. I am your Caboose. I am your Caboose. I am your Cab Sir. [3234_END]

Then:

[3234_START]Hello Sir! I'm your personal virtual assistant, VISOR. Direct your commands to me, and I
 will be your Cabot's horse. What would you like to do? [323 Pilgrim's End] [3234_END]

Or:

[3234_START]Hello Sir! I'm your personal virtual assistant, VISOR. Cab you Indicate your first command? [3234_END]

There's a few more before the first 2 I mentioned.

When I tried with another model (Meta-Llama-3-8B-Instruct-Q6_K.gguf), it worked normally again - and so did the original model. Doesn't happen anymore. But this isn't the first time. I don't know if rebooting the system fixes it too or not. Apparently switching models does it, for some reason.

I don't know how to reproduce this. And I don't know where the problem comes from. Also I hope I created the issue with the right severity. Sorry if I didn't get it right.

Name and Version

version: 3203 (b5a5f34)
built with cc (Debian 12.2.0-14) 12.2.0 for aarch64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

No response

@Edw590 Edw590 added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Jun 22, 2024
@ngxson
Copy link
Collaborator

ngxson commented Jun 23, 2024

You're not using the correct chat template, also --special argument is missing so special tokens are not properly tokenized. Consider trying with it llama-server

Related to #8068

@dspasyuk
Copy link
Contributor

@ngxson How do you run the server? I run it like this ../llama.cpp/llama-server -m ../../models/Meta-Llama-3-8B.Q5_K_S.gguf --n-gpu-layers 25 -c 0 and the output makes no sense, see video below:
Screencast from 2024-06-23 03:07:40 PM.webm

@ggerganov
Copy link
Member

@dspasyuk You have to use an "Instruct" model to be able to chat

@dspasyuk
Copy link
Contributor

You are correct @ggerganov I somehow mixed up an instruct model with a regular so this works for me: ../llama.cpp/llama-server -m ../../models/Meta-llama-3-8b-instruct_q5_k_s.gguf --n-gpu-layers 25 -c 0 @Edw590 and this works in llama-cli, not sure why but it does #8053

@MB7979
Copy link

MB7979 commented Jun 29, 2024

You're not using the correct chat template, also --special argument is missing so special tokens are not properly tokenized. Consider trying with it llama-server

Related to #8068

@ngxson, please forgive my ignorance, but I can’t find any clear documentation on these chat template changes. Since PR #8068 was committed is main now forcing the application of a chat template—the default one in the model’s metadata—if a specific custom chat template isn’t specified? Ever since I installed a new version of llama.cpp I’m getting terrible quality outputs compared to the old versions I run and I can’t figure out if it’s this change, the change to CUDA kernel processing, or something else entirely.

I regularly use models fine-tuned to accept a variety of templates, the default is often therefore not the best and I’m used to manually constructing the prompt, usually via a file. I checked the logs and couldn’t see any evidence of a chat template being auto applied but I’m concerned I’m missing something.

@dspasyuk
Copy link
Contributor

dspasyuk commented Jun 29, 2024

@ngxson I had the same issues, I was trying different settings templates, server, cli etc, the keywords used to apply templates in llama.cli do not seem to work as expected in my hands. I found the manuals at the end of the Readme page. Llama.cli requires a bos token for every single message you send to the model, which is not how it is described on Meta-llama3 Took some time to figure it out but it is finally working with the new version: Try this code, config for the llama.cli is in config.js file, the UI will print all logs once you run it: https://github.com/dspasyuk/llama.cui

@ngxson
Copy link
Collaborator

ngxson commented Jun 29, 2024

@MB7979 Just to confirm, are you trying to use a custom chat template?

If you want to switch to another built-in chat template, use --chat-template option: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

For example, --chat-template gemma

@ngxson
Copy link
Collaborator

ngxson commented Jun 29, 2024

FYI, I introduce a small patch in #8203, that will disable chat template whenever --in-prefix or --in-suffix is set. That should fix the problem when for example, user wants to use [3234_START] as start-of-turn (instead of <|start_of_turn|>)

@MB7979
Copy link

MB7979 commented Jun 29, 2024

@ngxson Thanks for your reply. No, I don’t want to use a non-manual chat template at all, or in-suffix and in-prefix. Just trying to confirm if this affecting the use of main when you want no chat template at all. My default way of using llama.cpp is to feed a long prompt via a file input, with whatever chat template I prefer manually entered in that. The equivalent of -p “[INST] Write a story about a fluffy llama. [/INST] “ entered directly in the command line.

@ngxson
Copy link
Collaborator

ngxson commented Jun 29, 2024

@MB7979 chat template is only activated with --conversation. If you don't use --conversation, nothing changes.

@github-actions github-actions bot added the stale label Jul 30, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) stale
Projects
None yet
Development

No branches or pull requests

5 participants