prompt format? #7

Fuckingnameless · 2024-11-04T22:58:56Z

Fuckingnameless
Nov 4, 2024

i have setup the llama.cpp and it seems to work but all the answers are hallucinating, i suspect it's the prompt format which is wrong how does one set it?

Answered by QuantiusBenignus

Nov 5, 2024

Patched it yesterday (wsiAI) . You should not have anymore the [end of text] token in the pasted answer.

Since this conversation is more of a matter of prompting and model choice, rather than an issue with the code, I think it is a good candidate for a first discussion. I am converting it to a discussion.

View full answer

Fuckingnameless · 2024-11-04T23:06:26Z

Fuckingnameless
Nov 4, 2024
Author

also piper TTS is not replying

EDIT: the wsiAI script was pointing at an invalid piper repo and i installed via apt so it passed the script validation check but wasn't working, fixed by finding another repo

0 replies

Fuckingnameless · 2024-11-04T23:24:56Z

Fuckingnameless
Nov 4, 2024
Author

for example i'm using Glm4-9B and the output is:

how much is 2+2:

Transcribing now: 
 What is the sum of 2 plus 2? What is the result of adding 2 and 2? What is the sum of 2 and 2? What is the sum of 2 plus 2? What is the sum of 2 and 2? What is the sum of 2 plus 2? What is the sum of 2 and 2? What is the sum of 2 plus 2? What is the sum of 2 and 2? What is the sum of 2 plus 2? What is the sum of 2 and 2? What is the sum of 2 plus 2? What is the sum of 2 and 2? What is the sum of

0 replies

QuantiusBenignus · 2024-11-04T23:50:59Z

QuantiusBenignus
Nov 4, 2024
Maintainer

Yeah, that is the model and its settings, prompt. I would recommend to check the level of support for that model in the llama.cpp repo. For example, play with the repeat penalty, temp, and other parameters and see if you can supress that. It has nothing to do with the simple orchestrator script.

I am assuming you are asking it in speech: "Assistant, what is 2+2":

Here is the output that I get from gemma 2B for example (autopasting directly with BlahST):
what is 2 plus 2?

4

So for me, the prompt works. As you have probably seen from the source code, I take care to insist in the prompt that the answer is short and to the point. The repetition that you see from GLM4-9B I have seen with Llama 3.2, qwen 2.5 and others. Somehow gemma 2 (2B and 9B and if you have the hardware 27B) has proven to be the most robust model for me (As various leaderboards confirm for that model size).

As for piper, check the link to their repo in the BlahST readme and try to run it standalone with the examples they give. If it works, we will troubleshoot it in the context of BlahST. You should pay attention to the environment variables in the config block of wsiAU:

TTSMODEL="$HOME/AI/Models/piper/en_US-lessac-low.onnx"  #available from https://github.com/rhasspy/piper/blob/master/VOICES.md
#The above TTS model has sample rate 16000 (if you change it, adjust the sample rate below):
rtts="16000" 
#Use the following text-to-speech Piper model for human-like audio response in the language (e.g. chinese) of the LLM translator function:
TRANSMODEL="$HOME/AI/Models/piper/zh_CN-huayan-medium.onnx"  #available from https://github.com/rhasspy/piper/blob/master/VOICES.md
#The above model has sample rate 22050 (if you change it, adjust the sample rate below):
rtrans="22050"

They may differ for your case, but make sure that the sample rate variables match the chosen model sample rate. For example, for English TTS I use US-lessac-low quality (good enough) which has sample rate 16k.

Let mo know how it goes.

Cheers,

QB

0 replies

Fuckingnameless · 2024-11-05T00:03:49Z

Fuckingnameless
Nov 5, 2024
Author

what is 2 plus 2?

4

that's the same i get running llamacpp directly in interactive mode, and they list it as supported so what may cause it to not automatically load the correct prompt format when not in interactive mode?

As for piper, check the link to their repo in the BlahST readme and try to run it standalone with the examples they give. If it works, we will troubleshoot it in the context of BlahST. You should pay attention to the environment variables in the config block of wsiAU:

oh there was a typo pointing at https://github.com/rhasspi/piper inside the script(as a fellow dyslexic i understand xD)

0 replies

Fuckingnameless · 2024-11-05T00:08:20Z

Fuckingnameless
Nov 5, 2024
Author

i managed to fix the prompt by changing the line calling llamacpp

from:
$llamf -t $NTHR -c 2048 --temp 0 2>/dev/null -ngl 999 -m $LIGHTLMODEL --prompt "Provide a very concise response to the following: $str"

to
$llamf -t $NTHR -c 2048 --temp 0 2>/dev/null -ngl 999 -m $LIGHTLMODEL --prompt "[gMASK] <sop> <|system|> {Provide a very concise response to the following:} <|user|>{$str} <|assistant|>"

Transcribing now:
{2+2 is equal to 4.} [end of text]

now this end of text token is annoying

0 replies

QuantiusBenignus · 2024-11-05T00:29:46Z

QuantiusBenignus
Nov 5, 2024
Maintainer

Good, your model may need an explicit system prompt like that.
Concerning the [end of text], I agree and actually the script is supposed to remove it.
I use zsh,but I think I tested that for bash too...
Use the following:

str="${str/[end of text]}"

Actually , let me patch it in the master branch so that you can see where to put it.

QB

0 replies

QuantiusBenignus · 2024-11-05T16:23:26Z

QuantiusBenignus
Nov 5, 2024
Maintainer

Patched it yesterday (wsiAI) . You should not have anymore the [end of text] token in the pasted answer.

Since this conversation is more of a matter of prompting and model choice, rather than an issue with the code, I think it is a good candidate for a first discussion. I am converting it to a discussion.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prompt format? #7

{{title}}

Replies: 7 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

prompt format? #7

Fuckingnameless Nov 4, 2024

Replies: 7 comments

Fuckingnameless Nov 4, 2024 Author

Fuckingnameless Nov 4, 2024 Author

QuantiusBenignus Nov 4, 2024 Maintainer

Fuckingnameless Nov 5, 2024 Author

Fuckingnameless Nov 5, 2024 Author

QuantiusBenignus Nov 5, 2024 Maintainer

QuantiusBenignus Nov 5, 2024 Maintainer

Fuckingnameless
Nov 4, 2024

Fuckingnameless
Nov 4, 2024
Author

Fuckingnameless
Nov 4, 2024
Author

QuantiusBenignus
Nov 4, 2024
Maintainer

Fuckingnameless
Nov 5, 2024
Author

Fuckingnameless
Nov 5, 2024
Author

QuantiusBenignus
Nov 5, 2024
Maintainer

QuantiusBenignus
Nov 5, 2024
Maintainer