examples/server: "New UI" chat becomes slower with each subsequent message

### What happened?

when using examples/server's "New UI", parts of the chat history seem to be re-evaluated (skipping the KV cache?) on each new message from the user. this is not the case with `llama-cli` or examples/server in the old UI mode with default settings/prompt.

this seems to be a common failure mode for third-party frontends to llama.cpp, maybe there is an issue with the API layer that is making this problem difficult for frontends to solve? https://github.com/ggerganov/llama.cpp/issues/7185

### Name and Version

version: 3151 (f8ec8877)
built with cc (Debian 13.2.0-25) 13.2.0 for x86_64-linux-gnu

### What operating system are you seeing the problem on?

Linux

### Relevant log output

```shell
INFO [           print_timings] prompt eval time     =     189.41 ms /     1 tokens (  189.41 ms per token,     5.28 tokens per second) | tid="140556433274816" timestamp=1718408696 id_slot=0 id_task=3534 t_prompt_processing=189.405 n_prompt_tokens_processed=1 t_token=189.405 n_tokens_second=5.2796916660067055

INFO [           print_timings] prompt eval time     =    2473.22 ms /    40 tokens (   61.83 ms per token,    16.17 tokens per second) | tid="140556433274816" timestamp=1718408717 id_slot=0 id_task=3564 t_prompt_processing=2473.219 n_prompt_tokens_processed=40 t_token=61.830475 n_tokens_second=16.173254370114414

INFO [           print_timings] prompt eval time     =    5231.45 ms /    83 tokens (   63.03 ms per token,    15.87 tokens per second) | tid="140556433274816" timestamp=1718408745 id_slot=0 id_task=3632 t_prompt_processing=5231.451 n_prompt_tokens_processed=83 t_token=63.02953012048193 n_tokens_second=15.865579167232953

INFO [           print_timings] prompt eval time     =    6692.69 ms /   105 tokens (   63.74 ms per token,    15.69 tokens per second) | tid="140556433274816" timestamp=1718408774 id_slot=0 id_task=3721 t_prompt_processing=6692.691 n_prompt_tokens_processed=105 t_token=63.739914285714285 n_tokens_second=15.688756585355577

INFO [           print_timings] prompt eval time     =    5536.72 ms /    90 tokens (   61.52 ms per token,    16.26 tokens per second) | tid="140556433274816" timestamp=1718408815 id_slot=0 id_task=3797 t_prompt_processing=5536.721 n_prompt_tokens_processed=90 t_token=61.519122222222215 n_tokens_second=16.255108393578077

INFO [           print_timings] prompt eval time     =    6353.86 ms /   106 tokens (   59.94 ms per token,    16.68 tokens per second) | tid="140556433274816" timestamp=1718408885 id_slot=0 id_task=3885 t_prompt_processing=6353.859 n_prompt_tokens_processed=106 t_token=59.942066037735856 n_tokens_second=16.68277498760989

INFO [           print_timings] prompt eval time     =    8704.61 ms /   134 tokens (   64.96 ms per token,    15.39 tokens per second) | tid="140556433274816" timestamp=1718408926 id_slot=0 id_task=4002 t_prompt_processing=8704.613 n_prompt_tokens_processed=134 t_token=64.95979850746268 n_tokens_second=15.3941364193905
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

examples/server: "New UI" chat becomes slower with each subsequent message #7944

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

examples/server: "New UI" chat becomes slower with each subsequent message #7944

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions