Replies: 2 comments 3 replies
-
are you using curl for llamacpp server? What configuration parameters are you sending to via curl? |
Beta Was this translation helpful? Give feedback.
-
Hi!
while the server shows:
As you can see the default temperature and the default min_p are different. In your case the temperature was defined in your request but the min_p was no set. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
What exactly I need to do to force llama server behave the same way as it works in llama cli or any implementation mode?
I'll explain it. Every model running with llama cpp works as expected when it's run from within any app like ollama or LM Studio, or even llama-cli. Yet, once I'm trying to run the model in the llama cpp server mode I stumble upon the same issue for months.
My idea is to use a model as a translator. I've been trying lots of them. Currently, I'm trying to work with Qwen 2.5 Q4.
If I ask the model to (literally): "Translate this text from Dutch to English" in -cnv (chat) mode, the result will always be an English output. Yet, once I'm attempting to do the same in production mode (in my case llama server), the model can accidentally write the same text in Dutch totally ignoring my instructions. The bug can happen and can not. But if the bug happens, it will be continuing every next run of the model; I mean with every API call.
I spent months with this issue. There is no flexible Python instruction, so I'm using the one presented in the official documentation (using it via openai)....
I'm totally disappointed. And i don't know what to do.
All I need is that model to behave absolutely the same way as it behave in conversation mode and that's it.
Beta Was this translation helpful? Give feedback.
All reactions