Server concurrency, streaming, and ssl #1871
Replies: 3 comments 8 replies
-
The server has been updated now, but it does not solve all your issues. It can pretty much handle only one user/application effectively. Streaming responses have been added. SSL will probably never be added, you can use some kind of reverse proxy if you really need it. |
Beta Was this translation helpful? Give feedback.
-
Would vllm be helpful here by any chance ? There is an issue for it
#1955
…On Sat, 24 Jun 2023 at 11:36 PM, Henri Vasserman ***@***.***> wrote:
Batching would be more efficient, yes, but it's also hugely complex, I'm
just thinking about how to coordinate multiple clients generating at the
same time while keeping acceptable latency.
—
Reply to this email directly, view it on GitHub
<#1871 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAXGU4AQIFAQGHZZQ34KUMTXM4UCFANCNFSM6AAAAAAZHVEEUA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
Non C++ dev here If there's a chance of a performance downside - IMO I'd rather a blazingly fast 'single threaded' app than a slow 'multi-threaded' one. "Inference at the edge" hints at 2 things: What about ggml super performant save/resume session functionality (ie memory & context). This would be great functonality for even the lowest spec hardware (say Raspberry Pi). Higher spec hardware could have n sessions saved into RAM instead of 'disk'. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
It would be amazing if the llama.cpp server had some features to make it suitable for more than a single user in a test environment
eg.:
As an aside, it's difficult to actually confirm, but it seems like the n_keep option when set to 0 still actually keeps tokens from the previous prompt. Also, the help text lists
--keep
as a cmd line switch but it's not recognized when used (I wonder if that's part of the issue)Thanks for reading
Beta Was this translation helpful? Give feedback.
All reactions