Outputs tend to be longer after the commit "server : refactor #5882"

**Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.**

After the commit <https://github.com/ggerganov/llama.cpp/commit/2002bc96bf2cbf5ab981a17d7e994d817c9801f5>, Mistral-7B-Instruct-v0.2 (<https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/commit/b70aa86578567ba3301b21c8a27bea4e8f6d6d61>) produces longer outputs before the commit.
This merge commit is too long, and contains too many commits before merge.
I applied `git bisect` in the commits before merge <https://github.com/ggerganov/llama.cpp/commits/87a4a105b2fafb291610c1e28f97b8ba07c6f2d7>.
(I would like you not to remove each commit before merge ...)
After that, I found the commit <https://github.com/ggerganov/llama.cpp/commit/bfb121fd2ec8d55955a92ba81bf6eb508b157c95> which triggers the following behavior:

```
% curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{"prompt": "Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:", "n_predict": 32, "seed": 0, "temperature": 0.0}'
{"content":" Yes\n\nQuestion: What is the smallest common multiple of 12 and 36?\nAnswer: 72\n\nQuestion:","generation_settings":{"dynatemp_exponent":1.0,"dynatemp_range":0.0,"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_keep":0,"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","n_ctx":8192,"n_keep":0,"n_predict":-1,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"samplers":["top_k","tfs_z","typical_p","top_p","min_p","temperature"],"seed":0,"stop":[],"stream":false,"temperature":0.0,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"id_slot":0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","prompt":"Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:","stop":true,"stopped_eos":false,"stopped_limit":true,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":2719.092,"predicted_n":32,"predicted_per_second":11.768634529467924,"predicted_per_token_ms":84.971625,"prompt_ms":404.269,"prompt_n":24,"prompt_per_second":59.36641196826865,"prompt_per_token_ms":16.844541666666668},"tokens_cached":55,"tokens_evaluated":24,"tokens_predicted":32,"truncated":false}
```

However, its previous commit <https://github.com/ggerganov/llama.cpp/commit/aef02b11ec41d1f4eaef0cf825f25e78fe6c8ea2> did not trigger this mysterious behavior:

```
% curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{"prompt": "Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:", "n_predict": 32, "seed": 0, "temperature": 0.0}'
{"content":" Yes","generation_settings":{"dynatemp_exponent":1.0,"dynatemp_range":0.0,"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_keep":0,"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","n_ctx":8192,"n_keep":0,"n_predict":-1,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"samplers":["top_k","tfs_z","typical_p","top_p","min_p","temperature"],"seed":0,"stop":[],"stream":false,"temperature":0.0,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"id_slot":0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","prompt":"Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:","stop":true,"stopped_eos":true,"stopped_limit":false,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":88.863,"predicted_n":2,"predicted_per_second":22.5065550341537,"predicted_per_token_ms":44.4315,"prompt_ms":403.348,"prompt_n":24,"prompt_per_second":59.50196852345864,"prompt_per_token_ms":16.806166666666666},"tokens_cached":25,"tokens_evaluated":24,"tokens_predicted":2,"truncated":false}
```

I launched the server in both cases as follows:

```
./server -m ./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf -c 8192
```

What's the difference?

**If the bug concerns the server, please try to reproduce it first using the [server test scenario framework](https://github.com/ggerganov/llama.cpp/tree/master/examples/server/tests).**

Yes, this is related to the server.
But before reproducing with it, please tell me if this commit <https://github.com/ggerganov/llama.cpp/commit/bfb121fd2ec8d55955a92ba81bf6eb508b157c95> is not buggy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Outputs tend to be longer after the commit "server : refactor #5882" #5934

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Outputs tend to be longer after the commit "server : refactor #5882" #5934

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions