prompt format? #7
-
i have setup the llama.cpp and it seems to work but all the answers are hallucinating, i suspect it's the prompt format which is wrong how does one set it? |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
also piper TTS is not replying EDIT: the wsiAI script was pointing at an invalid piper repo and i installed via apt so it passed the script validation check but wasn't working, fixed by finding another repo |
Beta Was this translation helpful? Give feedback.
-
for example i'm using Glm4-9B and the output is: how much is 2+2:
|
Beta Was this translation helpful? Give feedback.
-
Yeah, that is the model and its settings, prompt. I would recommend to check the level of support for that model in the llama.cpp repo. For example, play with the repeat penalty, temp, and other parameters and see if you can supress that. It has nothing to do with the simple orchestrator script. I am assuming you are asking it in speech: "Assistant, what is 2+2": Here is the output that I get from gemma 2B for example (autopasting directly with BlahST): 4 So for me, the prompt works. As you have probably seen from the source code, I take care to insist in the prompt that the answer is short and to the point. The repetition that you see from GLM4-9B I have seen with Llama 3.2, qwen 2.5 and others. Somehow gemma 2 (2B and 9B and if you have the hardware 27B) has proven to be the most robust model for me (As various leaderboards confirm for that model size). As for piper, check the link to their repo in the BlahST readme and try to run it standalone with the examples they give. If it works, we will troubleshoot it in the context of BlahST. You should pay attention to the environment variables in the config block of wsiAU:
They may differ for your case, but make sure that the sample rate variables match the chosen model sample rate. For example, for English TTS I use US-lessac-low quality (good enough) which has sample rate 16k. Let mo know how it goes. Cheers, QB |
Beta Was this translation helpful? Give feedback.
-
that's the same i get running llamacpp directly in interactive mode, and they list it as supported so what may cause it to not automatically load the correct prompt format when not in interactive mode?
oh there was a typo pointing at https://github.com/rhasspi/piper inside the script(as a fellow dyslexic i understand xD) |
Beta Was this translation helpful? Give feedback.
-
i managed to fix the prompt by changing the line calling llamacpp from: to Transcribing now: now this end of text token is annoying |
Beta Was this translation helpful? Give feedback.
-
Good, your model may need an explicit system prompt like that. str="${str/[end of text]}" Actually , let me patch it in the master branch so that you can see where to put it. QB |
Beta Was this translation helpful? Give feedback.
-
Patched it yesterday (wsiAI) . You should not have anymore the [end of text] token in the pasted answer. Since this conversation is more of a matter of prompting and model choice, rather than an issue with the code, I think it is a good candidate for a first discussion. I am converting it to a discussion. |
Beta Was this translation helpful? Give feedback.
Patched it yesterday (wsiAI) . You should not have anymore the [end of text] token in the pasted answer.
Since this conversation is more of a matter of prompting and model choice, rather than an issue with the code, I think it is a good candidate for a first discussion. I am converting it to a discussion.