Skip to content

v1/chat/completions endpoint does not honor cache_prompt #4329

Closed
@Michael-F-Ellis

Description

@Michael-F-Ellis

Expected Behavior

This is a follow-on to issue #4287. The README for server says that generation tags, e.g. mirostat_tau, are honored by the OAI compatible v1/chat/completions interface. That doesn't seem to be happening with either cache_prompt or slot_id.

Current Behavior

The cache is not used by by follow-on requests that include previous prompts and generated text. Instead, server re-evaluates the entire prompt. See the verbose log output appended below.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

  • Physical (or virtual) hardware you are using, e.g. for Linux:
    Model Name: Mac mini
    Model Identifier: Macmini9,1
    Model Number: Z12P000KGLL/A
    Chip: Apple M1
    Total Number of Cores: 8 (4 performance and 4 efficiency)
    Memory: 16 GB
    System Firmware Version: 10151.41.12
    OS Loader Version: 10151.41.12

  • Operating System, MacOS Sonoma

$ python3 --version
Python 3.10.9
$ make --version
GNU Make 3.81
$ g++ --version
Apple clang version 15.0.0 (clang-1500.0.40.1)

Failure Information (for bugs)

To reproduce:

  1. Start the server with a suitable GGUF model (I've tested with zephyr-7B, and mistral-7B-open-orca)
    e.g. ./server -m $MODELS/zephyr-7b-beta.Q4_K_M.gguf -c 8192 -t 8 --host 192.168.86.31 -v >> /tmp/llog2.txt
  2. Send a request to the v1/chats/completions endpoint with "cache_prompt": true and "slot_id": 1 as JSON fields in the request.
  3. Catenate the generated text with the first prompt. Add a little new text.
  4. Send a new request with the catenated text as the prompt.
  5. Observe that the server evaluates the entire prompt instead of only the new text.

Failure Logs

{"timestamp":1701723744,"level":"INFO","function":"main","line":2658,"message":"build info","build":1595,"commit":"880f579"}
{"timestamp":1701723744,"level":"INFO","function":"main","line":2665,"message":"system info","n_threads":8,"n_threads_batch":-1,"total_threads":8,"system_info":"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | "}
{"timestamp":1701723744,"level":"INFO","function":"main","line":3043,"message":"HTTP server listening","hostname":"192.168.86.31","port":8080}
{"timestamp":1701723787,"level":"VERBOSE","function":"update_slots","line":1783,"message":"prompt ingested","n_past":0,"cached":"","to_eval":" <|im_start|>user\n\"Continue the story that starts below:\\nOnce upon a time, \\n\"<|im_end|>\n<|im_start|>assistant\n"}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":262,"token_text":"in","has_next_token":true,"n_remain":-1,"num_tokens_predicted":0,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":264,"token_text":" a","has_next_token":true,"n_remain":-1,"num_tokens_predicted":1,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":2082,"token_text":" far","has_next_token":true,"n_remain":-1,"num_tokens_predicted":2,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":28733,"token_text":"-","has_next_token":true,"n_remain":47,"num_tokens_predicted":3,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723788,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":1769,"token_text":"off","has_next_token":true,"n_remain":46,"num_tokens_predicted":4,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
<--- SNIP --->
{"timestamp":1701723791,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":264,"token_text":" a","has_next_token":true,"n_remain":2,"num_tokens_predicted":48,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723791,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":20180,"token_text":" prince","has_next_token":true,"n_remain":1,"num_tokens_predicted":49,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723791,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":477,"token_text":" from","has_next_token":false,"n_remain":0,"num_tokens_predicted":50,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":""}
{"timestamp":1701723791,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"192.168.86.145","remote_port":49447,"status":200,"method":"POST","path":"/v1/chat/completions","params":{}}
{"timestamp":1701723791,"level":"VERBOSE","function":"log_server_request","line":2613,"message":"request","request":"{\"model\":\"url\",\"messages\":[{\"role\":\"user\",\"content\":\"\\\"Continue the story that starts below:\\\\nOnce upon a time, \\\\n\\\"\"}],\"temperature\":1.4,\"n\":1,\"max_tokens\":50,\"cache_prompt\":true,\"slot_id\":1}","response":"{\"__verbose\":{\"content\":\"in a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from\",\"generation_settings\":{\"frequency_penalty\":0.0,\"grammar\":\"\",\"ignore_eos\":false,\"logit_bias\":[],\"min_p\":0.05000000074505806,\"mirostat\":0,\"mirostat_eta\":0.0,\"mirostat_tau\":0.0,\"model\":\"/Users/mellis/Library/ApplicationSupport/faraday/models/zephyr-7b-beta.Q4_K_M.gguf\",\"n_ctx\":8192,\"n_keep\":0,\"n_predict\":50,\"n_probs\":0,\"penalize_nl\":false,\"presence_penalty\":0.0,\"repeat_last_n\":0,\"repeat_penalty\":1.100000023841858,\"seed\":0,\"stop\":[\"<|im_end|>\"],\"stream\":false,\"temp\":1.399999976158142,\"tfs_z\":0.0,\"top_k\":40,\"top_p\":0.949999988079071,\"typical_p\":0.0},\"model\":\"gpt-3.5-turbo-0613\",\"oaicompat_token_ctr\":50,\"prompt\":\"<|im_start|>user\\n\\\"Continue the story that starts below:\\\\nOnce upon a time, \\\\n\\\"<|im_end|>\\n<|im_start|>assistant\\n\",\"slot_id\":0,\"stop\":true,\"stopped_eos\":false,\"stopped_limit\":true,\"stopped_word\":false,\"stopping_word\":\"\",\"timings\":{\"predicted_ms\":3684.106,\"predicted_n\":50,\"predicted_per_second\":13.571813623169366,\"predicted_per_token_ms\":73.68212,\"prompt_ms\":817.203,\"prompt_n\":46,\"prompt_per_second\":56.28956330312053,\"prompt_per_token_ms\":17.765282608695653},\"tokens_cached\":96,\"tokens_evaluated\":46,\"tokens_predicted\":50,\"truncated\":false},\"choices\":[{\"finish_reason\":\"stop\",\"index\":0,\"message\":{\"content\":\"in a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from\",\"role\":\"assistant\"}}],\"created\":1701723791,\"id\":\"chatcmpl-Zl1BZGt8wJ2cJZqtGVhqmlD29AXRqMke\",\"model\":\"gpt-3.5-turbo-0613\",\"object\":\"chat.completion\",\"usage\":{\"completion_tokens\":50,\"prompt_tokens\":46,\"total_tokens\":96}}"}

<--- SECOND REQUEST --->
{"timestamp":1701723858,"level":"VERBOSE","function":"update_slots","line":1783,"message":"prompt ingested","n_past":0,"cached":"","to_eval":" <|im_start|>user\n\"Continue the story that starts below:\\nOnce upon a time, \\nin a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from a neighboring kingdom.\\n\\n\"<|im_end|>\n<|im_start|>assistant\n"}
{"timestamp":1701723860,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":2301,"token_text":"Is","has_next_token":true,"n_remain":0,"num_tokens_predicted":0,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723860,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":375,"token_text":"ab","has_next_token":true,"n_remain":0,"num_tokens_predicted":1,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723860,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":1994,"token_text":"ella","has_next_token":true,"n_remain":0,"num_tokens_predicted":2,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723860,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":403,"token_text":" was","has_next_token":true,"n_remain":47,"num_tokens_predicted":3,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
<--- SNIP --->
{"timestamp":1701723863,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":1885,"token_text":" conf","has_next_token":true,"n_remain":4,"num_tokens_predicted":46,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723863,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":1932,"token_text":"ided","has_next_token":true,"n_remain":3,"num_tokens_predicted":47,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723863,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":297,"token_text":" in","has_next_token":true,"n_remain":2,"num_tokens_predicted":48,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723863,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":559,"token_text":" her","has_next_token":true,"n_remain":1,"num_tokens_predicted":49,"stopped_eos":false,"stopped_word":false,"stopped_limit":false,"stopping_word":""}
{"timestamp":1701723864,"level":"VERBOSE","function":"process_token","line":1087,"message":"next token","token":16424,"token_text":" closest","has_next_token":false,"n_remain":0,"num_tokens_predicted":50,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":""}
{"timestamp":1701723864,"level":"INFO","function":"log_server_request","line":2608,"message":"request","remote_addr":"192.168.86.145","remote_port":49465,"status":200,"method":"POST","path":"/v1/chat/completions","params":{}}
{"timestamp":1701723864,"level":"VERBOSE","function":"log_server_request","line":2613,"message":"request","request":"{\"model\":\"url\",\"messages\":[{\"role\":\"user\",\"content\":\"\\\"Continue the story that starts below:\\\\nOnce upon a time, \\\\nin a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from a neighboring kingdom.\\\\n\\\\n\\\"\"}],\"temperature\":1.4,\"n\":1,\"max_tokens\":50,\"cache_prompt\":true,\"slot_id\":1}","response":"{\"__verbose\":{\"content\":\"Isabella was not opposed to the idea of marriage, but she had her heart set on a different suitor. His name was Edward, a commoner who had captured her heart with his kindness and bravery. She had confided in her closest\",\"generation_settings\":{\"frequency_penalty\":0.0,\"grammar\":\"\",\"ignore_eos\":false,\"logit_bias\":[],\"min_p\":0.05000000074505806,\"mirostat\":0,\"mirostat_eta\":0.0,\"mirostat_tau\":0.0,\"model\":\"/Users/mellis/Library/ApplicationSupport/faraday/models/zephyr-7b-beta.Q4_K_M.gguf\",\"n_ctx\":8192,\"n_keep\":0,\"n_predict\":50,\"n_probs\":0,\"penalize_nl\":false,\"presence_penalty\":0.0,\"repeat_last_n\":0,\"repeat_penalty\":1.100000023841858,\"seed\":0,\"stop\":[\"<|im_end|>\"],\"stream\":false,\"temp\":1.399999976158142,\"tfs_z\":0.0,\"top_k\":40,\"top_p\":0.949999988079071,\"typical_p\":0.0},\"model\":\"gpt-3.5-turbo-0613\",\"oaicompat_token_ctr\":50,\"prompt\":\"<|im_start|>user\\n\\\"Continue the story that starts below:\\\\nOnce upon a time, \\\\nin a far-off kingdom, there was a young princess named Isabella. She was kind, intelligent, and had a heart full of love for her people. However, her father, the king, had arranged for her to marry a prince from a neighboring kingdom.\\\\n\\\\n\\\"<|im_end|>\\n<|im_start|>assistant\\n\",\"slot_id\":0,\"stop\":true,\"stopped_eos\":false,\"stopped_limit\":true,\"stopped_word\":false,\"stopping_word\":\"\",\"timings\":{\"predicted_ms\":3714.829,\"predicted_n\":50,\"predicted_per_second\":13.459569740626016,\"predicted_per_token_ms\":74.29658,\"prompt_ms\":1406.272,\"prompt_n\":105,\"prompt_per_second\":74.66549856642243,\"prompt_per_token_ms\":13.393066666666666},\"tokens_cached\":155,\"tokens_evaluated\":105,\"tokens_predicted\":50,\"truncated\":false},\"choices\":[{\"finish_reason\":\"stop\",\"index\":0,\"message\":{\"content\":\"Isabella was not opposed to the idea of marriage, but she had her heart set on a different suitor. His name was Edward, a commoner who had captured her heart with his kindness and bravery. She had confided in her closest\",\"role\":\"assistant\"}}],\"created\":1701723864,\"id\":\"chatcmpl-UgvXb8MPIqGdEiAbnIJR7rfFlm8XX09W\",\"model\":\"gpt-3.5-turbo-0613\",\"object\":\"chat.completion\",\"usage\":{\"completion_tokens\":50,\"prompt_tokens\":105,\"total_tokens\":155}}"}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions