Closed
Description
What happened?
when using examples/server's "New UI", parts of the chat history seem to be re-evaluated (skipping the KV cache?) on each new message from the user. this is not the case with llama-cli
or examples/server in the old UI mode with default settings/prompt.
this seems to be a common failure mode for third-party frontends to llama.cpp, maybe there is an issue with the API layer that is making this problem difficult for frontends to solve? #7185
Name and Version
version: 3151 (f8ec887)
built with cc (Debian 13.2.0-25) 13.2.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
INFO [ print_timings] prompt eval time = 189.41 ms / 1 tokens ( 189.41 ms per token, 5.28 tokens per second) | tid="140556433274816" timestamp=1718408696 id_slot=0 id_task=3534 t_prompt_processing=189.405 n_prompt_tokens_processed=1 t_token=189.405 n_tokens_second=5.2796916660067055
INFO [ print_timings] prompt eval time = 2473.22 ms / 40 tokens ( 61.83 ms per token, 16.17 tokens per second) | tid="140556433274816" timestamp=1718408717 id_slot=0 id_task=3564 t_prompt_processing=2473.219 n_prompt_tokens_processed=40 t_token=61.830475 n_tokens_second=16.173254370114414
INFO [ print_timings] prompt eval time = 5231.45 ms / 83 tokens ( 63.03 ms per token, 15.87 tokens per second) | tid="140556433274816" timestamp=1718408745 id_slot=0 id_task=3632 t_prompt_processing=5231.451 n_prompt_tokens_processed=83 t_token=63.02953012048193 n_tokens_second=15.865579167232953
INFO [ print_timings] prompt eval time = 6692.69 ms / 105 tokens ( 63.74 ms per token, 15.69 tokens per second) | tid="140556433274816" timestamp=1718408774 id_slot=0 id_task=3721 t_prompt_processing=6692.691 n_prompt_tokens_processed=105 t_token=63.739914285714285 n_tokens_second=15.688756585355577
INFO [ print_timings] prompt eval time = 5536.72 ms / 90 tokens ( 61.52 ms per token, 16.26 tokens per second) | tid="140556433274816" timestamp=1718408815 id_slot=0 id_task=3797 t_prompt_processing=5536.721 n_prompt_tokens_processed=90 t_token=61.519122222222215 n_tokens_second=16.255108393578077
INFO [ print_timings] prompt eval time = 6353.86 ms / 106 tokens ( 59.94 ms per token, 16.68 tokens per second) | tid="140556433274816" timestamp=1718408885 id_slot=0 id_task=3885 t_prompt_processing=6353.859 n_prompt_tokens_processed=106 t_token=59.942066037735856 n_tokens_second=16.68277498760989
INFO [ print_timings] prompt eval time = 8704.61 ms / 134 tokens ( 64.96 ms per token, 15.39 tokens per second) | tid="140556433274816" timestamp=1718408926 id_slot=0 id_task=4002 t_prompt_processing=8704.613 n_prompt_tokens_processed=134 t_token=64.95979850746268 n_tokens_second=15.3941364193905