Closed
Description
Name and Version
b5621, built with cc (Debian 14.2.0-19) 14.2.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./llama-server -m google_gemma-3-27b-it-Q8_0.gguf -b 256 -ub 256 -fa -n -1 --host 0.0.0.0 --port 8001 --verbose --verbose-prompt --log-timestamps --slot-save-path ./
Problem description & steps to reproduce
When using gemma-3-27b-it, the prompt built by llama.cpp from a message array have two issues:
- Assistant message appears before system message;
- When there are multiple system messages, only the last one is included in the prompt.
How to test:
curl -X POST --location "http://localhost:8001/apply-template" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "system",
"content": "System message 1"
},
{
"role": "system",
"content": "System message 2"
},
{
"role": "assistant",
"content": "I am your assistant."
},
{
"role": "user",
"content": "Hello!"
},
{
"role": "assistant",
"content": "How can I help you?"
},
{
"role": "user",
"content": "Tell me a story."
}
]
}'
Actual result:
{"prompt":"<start_of_turn>model\nI am your assistant.<end_of_turn>\n<start_of_turn>user\nSystem message 2\n\nHello!<end_of_turn>\n<start_of_turn>model\nHow can I help you?<end_of_turn>\n<start_of_turn>user\nTell me a story.<end_of_turn>\n<start_of_turn>model\n"}
The prompt have the first assistant message at the front, and only System message 2 is included.
Expected result:
- Built prompt should have the messages appear in the above order, assistant message should not appear in the front.
- All system messages should be included.
However, when using other models, such as Mistral Nemo (Mistral V3), the prompt is built as expected, with all the parts, and in correct order:
{"prompt":"[INST] System message 1\nSystem message 2\nI am your assistant.</s>[INST] Hello! [/INST]How can I help you?</s>[INST] Tell me a story. [/INST]"}
... and Mistral Small 2501 (Mistral V7), the prompt is built as expected:
{"prompt":"[SYSTEM_PROMPT]System message 1[/SYSTEM_PROMPT][SYSTEM_PROMPT]System message 2[/SYSTEM_PROMPT]I am your assistant.</s>[INST]Hello![/INST]How can I help you?</s>[INST]Tell me a story.[/INST]"}
... and also Qwen2.5 (ChatML), the prompt is built as expected, with all the parts, and in correct order:
{"prompt":"<|im_start|>system\nSystem message 1<|im_end|>\n<|im_start|>system\nSystem message 2<|im_end|>\n<|im_start|>assistant\nI am your assistant.<|im_end|>\n<|im_start|>user\nHello!<|im_end|>\n<|im_start|>assistant\nHow can I help you?<|im_end|>\n<|im_start|>user\nTell me a story.<|im_end|>\n<|im_start|>assistant\n"}