server: coherent log output for KV cache full #6637

phymbert · 2024-04-12T11:13:23Z

Motivation

Some logs are using llama log method instead of the server one.

Changes

Use server logs dedicated method when KV Cache is full.

Still some to update during slots context shifting, but it can be done later on.

Concerns

One would want to log also server logs in llama.log but it can be done later.

References

Please upgrade the KV cache size yes using --ctx-size #6617 (comment)

server: coherent log output for KV cache full

e4e9bc0

phymbert requested review from ggerganov and slaren April 12, 2024 11:13

phymbert added the server/webui label Apr 12, 2024

ggerganov approved these changes Apr 12, 2024

View reviewed changes

ggerganov merged commit 24ee66e into master Apr 12, 2024

phymbert deleted the hp/server/coherent-logs branch April 12, 2024 11:55

tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024

server : coherent log output for KV cache full (ggml-org#6637)

2d8f549

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: coherent log output for KV cache full #6637

server: coherent log output for KV cache full #6637

Uh oh!

phymbert commented Apr 12, 2024

Uh oh!

Uh oh!

server: coherent log output for KV cache full #6637

server: coherent log output for KV cache full #6637

Uh oh!

Conversation

phymbert commented Apr 12, 2024

Motivation

Changes

Concerns

References

Uh oh!

Uh oh!