Description
Expected Behavior
This is a follow-on to issue #4287. The README for server
says that generation tags, e.g. mirostat_tau, are honored by the OAI compatible v1/chat/completions interface. That doesn't seem to be happening with either cache_prompt
or slot_id
.
Current Behavior
The cache is not used by by follow-on requests that include previous prompts and generated text. Instead, server
re-evaluates the entire prompt. See the verbose log output appended below.
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
-
Physical (or virtual) hardware you are using, e.g. for Linux:
Model Name: Mac mini
Model Identifier: Macmini9,1
Model Number: Z12P000KGLL/A
Chip: Apple M1
Total Number of Cores: 8 (4 performance and 4 efficiency)
Memory: 16 GB
System Firmware Version: 10151.41.12
OS Loader Version: 10151.41.12 -
Operating System, MacOS Sonoma
$ python3 --version
Python 3.10.9
$ make --version
GNU Make 3.81
$ g++ --version
Apple clang version 15.0.0 (clang-1500.0.40.1)
Failure Information (for bugs)
To reproduce:
- Start the server with a suitable GGUF model (I've tested with zephyr-7B, and mistral-7B-open-orca)
e.g../server -m $MODELS/zephyr-7b-beta.Q4_K_M.gguf -c 8192 -t 8 --host 192.168.86.31 -v >> /tmp/llog2.txt
- Send a request to the
v1/chats/completions endpoint
with"cache_prompt": true
and"slot_id": 1
as JSON fields in the request. - Catenate the generated text with the first prompt. Add a little new text.
- Send a new request with the catenated text as the prompt.
- Observe that the server evaluates the entire prompt instead of only the new text.
Failure Logs
{"timestamp":1701723744,"level":"INFO","function":"main","line":2658,"message":"build info","build":1595,"commit":"880f579"}
{"timestamp":1701723744,"level":"INFO","function":"main","line":2665,"message":"system info","n_threads":8,"n_threads_batch":-1,"total_threads":8,"system_info":"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | "}
{"timestamp":1701723744,"level":"INFO","function":"main","line":3043,"message":"HTTP server listening","hostname":"192.168.86.31","port":8080}
{"timestamp":1701723787,"level":"VERBOSE","function":"update_slots","line":1783,"message":"prompt ingested","n_past":0,"cached":"","to_eval":" <|im_start|>user\n\"Continue the story that starts below:\\nOnce upon a time, \\n\"<|im_end|>\n<|im_start|>assistant\n"}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":262,"token_text":"in","has_next_token":true,"n_remain":-1,"num_tokens_predicted":0,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":264,"token_text":" a","has_next_token":true,"n_remain":-1,"num_tokens_predicted":1,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":2082,"token_text":" far","has_next_token":true,"n_remain":-1,"num_tokens_predicted":2,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":28733,"token_text":"-","has_next_token":true,"n_remain":47,"num_tokens_predicted":3,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":1769,"token_text":"off","has_next_token":true,"n_remain":46,"num_tokens_predicted":4,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
<--- SNIP --->
{"timestamp":1701723791,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":264,"token_text":" a","has_next_token":true,"n_remain":2,"num_tokens_predicted":48,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723791,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":20180,"token_text":" prince","has_next_token":true,"n_remain":1,"num_tokens_predicted":49,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723791,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":477,"token_text":" from","has_next_token":false,"n_remain":0,"num_tokens_predicted":50,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":""}
{"timestamp":1701723791,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"192.168.86.145","remote_port":49447,"status":200,"method":"POST","path":"/v1/chat/completions","params":{}}
{"timestamp":1701723791,"level":"VERBOSE","function":"log_server_request","line":2613,"message":"request","request":"{\"model\":\"url\",\"messages\":[{\"role\":\"user\",\"content\":\"\\\"Continue the story that starts below:\\\\nOnce upon a time, \\\\n\\\"\"}],\"temperature\":1.4,\"n\":1,\"max_tokens\":50,\"cache_prompt\":true,\"slot_id\":1}","response":"{\"__verbose\":{\"content\":\"in a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from\",\"generation_settings\":{\"frequency_penalty\":0.0,\"grammar\":\"\",\"ignore_eos\":false,\"logit_bias\":[],\"min_p\":0.05000000074505806,\"mirostat\":0,\"mirostat_eta\":0.0,\"mirostat_tau\":0.0,\"model\":\"/Users/mellis/Library/ApplicationSupport/faraday/models/zephyr-7b-beta.Q4_K_M.gguf\",\"n_ctx\":8192,\"n_keep\":0,\"n_predict\":50,\"n_probs\":0,\"penalize_nl\":false,\"presence_penalty\":0.0,\"repeat_last_n\":0,\"repeat_penalty\":1.100000023841858,\"seed\":0,\"stop\":[\"<|im_end|>\"],\"stream\":false,\"temp\":1.399999976158142,\"tfs_z\":0.0,\"top_k\":40,\"top_p\":0.949999988079071,\"typical_p\":0.0},\"model\":\"gpt-3.5-turbo-0613\",\"oaicompat_token_ctr\":50,\"prompt\":\"<|im_start|>user\\n\\\"Continue the story that starts below:\\\\nOnce upon a time, \\\\n\\\"<|im_end|>\\n<|im_start|>assistant\\n\",\"slot_id\":0,\"stop\":true,\"stopped_eos\":false,\"stopped_limit\":true,\"stopped_word\":false,\"stopping_word\":\"\",\"timings\":{\"predicted_ms\":3684.106,\"predicted_n\":50,\"predicted_per_second\":13.571813623169366,\"predicted_per_token_ms\":73.68212,\"prompt_ms\":817.203,\"prompt_n\":46,\"prompt_per_second\":56.28956330312053,\"prompt_per_token_ms\":17.765282608695653},\"tokens_cached\":96,\"tokens_evaluated\":46,\"tokens_predicted\":50,\"truncated\":false},\"choices\":[{\"finish_reason\":\"stop\",\"index\":0,\"message\":{\"content\":\"in a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from\",\"role\":\"assistant\"}}],\"created\":1701723791,\"id\":\"chatcmpl-Zl1BZGt8wJ2cJZqtGVhqmlD29AXRqMke\",\"model\":\"gpt-3.5-turbo-0613\",\"object\":\"chat.completion\",\"usage\":{\"completion_tokens\":50,\"prompt_tokens\":46,\"total_tokens\":96}}"}
<--- SECOND REQUEST --->
{"timestamp":1701723858,"level":"VERBOSE","function":"update_slots","line":1783,"message":"prompt ingested","n_past":0,"cached":"","to_eval":" <|im_start|>user\n\"Continue the story that starts below:\\nOnce upon a time, \\nin a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from a neighboring kingdom.\\n\\n\"<|im_end|>\n<|im_start|>assistant\n"}
{"timestamp":1701723860,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":2301,"token_text":"Is","has_next_token":true,"n_remain":0,"num_tokens_predicted":0,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723860,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":375,"token_text":"ab","has_next_token":true,"n_remain":0,"num_tokens_predicted":1,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723860,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":1994,"token_text":"ella","has_next_token":true,"n_remain":0,"num_tokens_predicted":2,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723860,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":403,"token_text":" was","has_next_token":true,"n_remain":47,"num_tokens_predicted":3,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
<--- SNIP --->
{"timestamp":1701723863,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":1885,"token_text":" conf","has_next_token":true,"n_remain":4,"num_tokens_predicted":46,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723863,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":1932,"token_text":"ided","has_next_token":true,"n_remain":3,"num_tokens_predicted":47,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723863,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":297,"token_text":" in","has_next_token":true,"n_remain":2,"num_tokens_predicted":48,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723863,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":559,"token_text":" her","has_next_token":true,"n_remain":1,"num_tokens_predicted":49,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723864,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":16424,"token_text":" closest","has_next_token":false,"n_remain":0,"num_tokens_predicted":50,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":""}
{"timestamp":1701723864,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"192.168.86.145","remote_port":49465,"status":200,"method":"POST","path":"/v1/chat/completions","params":{}}
{"timestamp":1701723864,"level":"VERBOSE","function":"log_server_request","line":2613,"message":"request","request":"{\"model\":\"url\",\"messages\":[{\"role\":\"user\",\"content\":\"\\\"Continue the story that starts below:\\\\nOnce upon a time, \\\\nin a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from a neighboring kingdom.\\\\n\\\\n\\\"\"}],\"temperature\":1.4,\"n\":1,\"max_tokens\":50,\"cache_prompt\":true,\"slot_id\":1}","response":"{\"__verbose\":{\"content\":\"Isabella was not opposed to the idea of marriage, but she had her heart set on a different suitor. His name was Edward, a commoner who had captured her heart with his kindness and bravery. She had confided in her closest\",\"generation_settings\":{\"frequency_penalty\":0.0,\"grammar\":\"\",\"ignore_eos\":false,\"logit_bias\":[],\"min_p\":0.05000000074505806,\"mirostat\":0,\"mirostat_eta\":0.0,\"mirostat_tau\":0.0,\"model\":\"/Users/mellis/Library/ApplicationSupport/faraday/models/zephyr-7b-beta.Q4_K_M.gguf\",\"n_ctx\":8192,\"n_keep\":0,\"n_predict\":50,\"n_probs\":0,\"penalize_nl\":false,\"presence_penalty\":0.0,\"repeat_last_n\":0,\"repeat_penalty\":1.100000023841858,\"seed\":0,\"stop\":[\"<|im_end|>\"],\"stream\":false,\"temp\":1.399999976158142,\"tfs_z\":0.0,\"top_k\":40,\"top_p\":0.949999988079071,\"typical_p\":0.0},\"model\":\"gpt-3.5-turbo-0613\",\"oaicompat_token_ctr\":50,\"prompt\":\"<|im_start|>user\\n\\\"Continue the story that starts below:\\\\nOnce upon a time, \\\\nin a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from a neighboring kingdom.\\\\n\\\\n\\\"<|im_end|>\\n<|im_start|>assistant\\n\",\"slot_id\":0,\"stop\":true,\"stopped_eos\":false,\"stopped_limit\":true,\"stopped_word\":false,\"stopping_word\":\"\",\"timings\":{\"predicted_ms\":3714.829,\"predicted_n\":50,\"predicted_per_second\":13.459569740626016,\"predicted_per_token_ms\":74.29658,\"prompt_ms\":1406.272,\"prompt_n\":105,\"prompt_per_second\":74.66549856642243,\"prompt_per_token_ms\":13.393066666666666},\"tokens_cached\":155,\"tokens_evaluated\":105,\"tokens_predicted\":50,\"truncated\":false},\"choices\":[{\"finish_reason\":\"stop\",\"index\":0,\"message\":{\"content\":\"Isabella was not opposed to the idea of marriage, but she had her heart set on a different suitor. His name was Edward, a commoner who had captured her heart with his kindness and bravery. She had confided in her closest\",\"role\":\"assistant\"}}],\"created\":1701723864,\"id\":\"chatcmpl-UgvXb8MPIqGdEiAbnIJR7rfFlm8XX09W\",\"model\":\"gpt-3.5-turbo-0613\",\"object\":\"chat.completion\",\"usage\":{\"completion_tokens\":50,\"prompt_tokens\":105,\"total_tokens\":155}}"}