Description
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
After the commit 2002bc9, Mistral-7B-Instruct-v0.2 (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/commit/b70aa86578567ba3301b21c8a27bea4e8f6d6d61) produces longer outputs before the commit.
This merge commit is too long, and contains too many commits before merge.
I applied git bisect
in the commits before merge https://github.com/ggerganov/llama.cpp/commits/87a4a105b2fafb291610c1e28f97b8ba07c6f2d7.
(I would like you not to remove each commit before merge ...)
After that, I found the commit bfb121f which triggers the following behavior:
% curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{"prompt": "Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:", "n_predict": 32, "seed": 0, "temperature": 0.0}'
{"content":" Yes\n\nQuestion: What is the smallest common multiple of 12 and 36?\nAnswer: 72\n\nQuestion:","generation_settings":{"dynatemp_exponent":1.0,"dynatemp_range":0.0,"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_keep":0,"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","n_ctx":8192,"n_keep":0,"n_predict":-1,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"samplers":["top_k","tfs_z","typical_p","top_p","min_p","temperature"],"seed":0,"stop":[],"stream":false,"temperature":0.0,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"id_slot":0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","prompt":"Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:","stop":true,"stopped_eos":false,"stopped_limit":true,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":2719.092,"predicted_n":32,"predicted_per_second":11.768634529467924,"predicted_per_token_ms":84.971625,"prompt_ms":404.269,"prompt_n":24,"prompt_per_second":59.36641196826865,"prompt_per_token_ms":16.844541666666668},"tokens_cached":55,"tokens_evaluated":24,"tokens_predicted":32,"truncated":false}
However, its previous commit aef02b1 did not trigger this mysterious behavior:
% curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{"prompt": "Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:", "n_predict": 32, "seed": 0, "temperature": 0.0}'
{"content":" Yes","generation_settings":{"dynatemp_exponent":1.0,"dynatemp_range":0.0,"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_keep":0,"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","n_ctx":8192,"n_keep":0,"n_predict":-1,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"samplers":["top_k","tfs_z","typical_p","top_p","min_p","temperature"],"seed":0,"stop":[],"stream":false,"temperature":0.0,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"id_slot":0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","prompt":"Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:","stop":true,"stopped_eos":true,"stopped_limit":false,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":88.863,"predicted_n":2,"predicted_per_second":22.5065550341537,"predicted_per_token_ms":44.4315,"prompt_ms":403.348,"prompt_n":24,"prompt_per_second":59.50196852345864,"prompt_per_token_ms":16.806166666666666},"tokens_cached":25,"tokens_evaluated":24,"tokens_predicted":2,"truncated":false}
I launched the server in both cases as follows:
./server -m ./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf -c 8192
What's the difference?
If the bug concerns the server, please try to reproduce it first using the server test scenario framework.
Yes, this is related to the server.
But before reproducing with it, please tell me if this commit bfb121f is not buggy.