Skip to content

Outputs tend to be longer after the commit "server : refactor #5882" #5934

Closed
@hiro4bbh

Description

@hiro4bbh

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

After the commit 2002bc9, Mistral-7B-Instruct-v0.2 (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/commit/b70aa86578567ba3301b21c8a27bea4e8f6d6d61) produces longer outputs before the commit.
This merge commit is too long, and contains too many commits before merge.
I applied git bisect in the commits before merge https://github.com/ggerganov/llama.cpp/commits/87a4a105b2fafb291610c1e28f97b8ba07c6f2d7.
(I would like you not to remove each commit before merge ...)
After that, I found the commit bfb121f which triggers the following behavior:

% curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{"prompt": "Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:", "n_predict": 32, "seed": 0, "temperature": 0.0}'
{"content":" Yes\n\nQuestion: What is the smallest common multiple of 12 and 36?\nAnswer: 72\n\nQuestion:","generation_settings":{"dynatemp_exponent":1.0,"dynatemp_range":0.0,"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_keep":0,"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","n_ctx":8192,"n_keep":0,"n_predict":-1,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"samplers":["top_k","tfs_z","typical_p","top_p","min_p","temperature"],"seed":0,"stop":[],"stream":false,"temperature":0.0,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"id_slot":0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","prompt":"Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:","stop":true,"stopped_eos":false,"stopped_limit":true,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":2719.092,"predicted_n":32,"predicted_per_second":11.768634529467924,"predicted_per_token_ms":84.971625,"prompt_ms":404.269,"prompt_n":24,"prompt_per_second":59.36641196826865,"prompt_per_token_ms":16.844541666666668},"tokens_cached":55,"tokens_evaluated":24,"tokens_predicted":32,"truncated":false}

However, its previous commit aef02b1 did not trigger this mysterious behavior:

% curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{"prompt": "Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:", "n_predict": 32, "seed": 0, "temperature": 0.0}'
{"content":" Yes","generation_settings":{"dynatemp_exponent":1.0,"dynatemp_range":0.0,"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_keep":0,"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","n_ctx":8192,"n_keep":0,"n_predict":-1,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"samplers":["top_k","tfs_z","typical_p","top_p","min_p","temperature"],"seed":0,"stop":[],"stream":false,"temperature":0.0,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"id_slot":0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","prompt":"Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:","stop":true,"stopped_eos":true,"stopped_limit":false,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":88.863,"predicted_n":2,"predicted_per_second":22.5065550341537,"predicted_per_token_ms":44.4315,"prompt_ms":403.348,"prompt_n":24,"prompt_per_second":59.50196852345864,"prompt_per_token_ms":16.806166666666666},"tokens_cached":25,"tokens_evaluated":24,"tokens_predicted":2,"truncated":false}

I launched the server in both cases as follows:

./server -m ./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf -c 8192

What's the difference?

If the bug concerns the server, please try to reproduce it first using the server test scenario framework.

Yes, this is related to the server.
But before reproducing with it, please tell me if this commit bfb121f is not buggy.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions