Skip to content

Misc. bug: llama-server builds possibly erroneous prompt for gemma 3 #14151

Closed
@mfritz2008

Description

@mfritz2008

Name and Version

b5621, built with cc (Debian 14.2.0-19) 14.2.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./llama-server -m google_gemma-3-27b-it-Q8_0.gguf -b 256 -ub 256 -fa -n -1 --host 0.0.0.0 --port 8001 --verbose --verbose-prompt --log-timestamps --slot-save-path ./

Problem description & steps to reproduce

When using gemma-3-27b-it, the prompt built by llama.cpp from a message array have two issues:

  1. Assistant message appears before system message;
  2. When there are multiple system messages, only the last one is included in the prompt.

How to test:

curl -X POST --location "http://localhost:8001/apply-template" \
    -H "Content-Type: application/json" \
    -d '{
          "messages": [
            {
              "role": "system",
              "content": "System message 1"
            },
            {
              "role": "system",
              "content": "System message 2"
            },
            {
              "role": "assistant",
              "content": "I am your assistant."
            },
            {
              "role": "user",
              "content": "Hello!"
            },
            {
              "role": "assistant",
              "content": "How can I help you?"
            },
            {
              "role": "user",
              "content": "Tell me a story."
            }
          ]
        }'

Actual result:

{"prompt":"<start_of_turn>model\nI am your assistant.<end_of_turn>\n<start_of_turn>user\nSystem message 2\n\nHello!<end_of_turn>\n<start_of_turn>model\nHow can I help you?<end_of_turn>\n<start_of_turn>user\nTell me a story.<end_of_turn>\n<start_of_turn>model\n"}

The prompt have the first assistant message at the front, and only System message 2 is included.

Expected result:

  • Built prompt should have the messages appear in the above order, assistant message should not appear in the front.
  • All system messages should be included.

However, when using other models, such as Mistral Nemo (Mistral V3), the prompt is built as expected, with all the parts, and in correct order:

{"prompt":"[INST] System message 1\nSystem message 2\nI am your assistant.</s>[INST] Hello! [/INST]How can I help you?</s>[INST] Tell me a story. [/INST]"}

... and Mistral Small 2501 (Mistral V7), the prompt is built as expected:

{"prompt":"[SYSTEM_PROMPT]System message 1[/SYSTEM_PROMPT][SYSTEM_PROMPT]System message 2[/SYSTEM_PROMPT]I am your assistant.</s>[INST]Hello![/INST]How can I help you?</s>[INST]Tell me a story.[/INST]"}

... and also Qwen2.5 (ChatML), the prompt is built as expected, with all the parts, and in correct order:

{"prompt":"<|im_start|>system\nSystem message 1<|im_end|>\n<|im_start|>system\nSystem message 2<|im_end|>\n<|im_start|>assistant\nI am your assistant.<|im_end|>\n<|im_start|>user\nHello!<|im_end|>\n<|im_start|>assistant\nHow can I help you?<|im_end|>\n<|im_start|>user\nTell me a story.<|im_end|>\n<|im_start|>assistant\n"}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions