-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Serve multiple models with [server] #906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Might also help to serve another model specifically for embeddings than to rely on whatever my latest 7B obsession does. |
I would like this too! |
Thank you!! And wow #736 is a huge idea. |
I've been working on a system that could potentially solve some of the limitations in the current server implementation. Specifically, I'm developing a solution that allows for:
This approach could significantly enhance performance and flexibility, especially for applications requiring rapid switching between different models or concurrent use of multiple models. I'd be happy to share more details or contribute to implementing this feature if there's interest. The system uses ZeroMQ for efficient inter-process communication and asyncio for non-blocking operations, allowing for high concurrency and scalability. Let me know if you'd like to discuss this further or if you have any questions about the proposed implementation." |
lcp[server] has been excellent. And I can host two models by running a second instance.
I'd like to be able to serve multiple models with a single instance of the OpenAI-compatible server and switch between them based on alias-able
model
in the query payload. My use case is to serve a code model and bakllava at the same time.I am going to see about attempting to PR an Nginx configuration example that would reverse proxy to two instances based on the
model
in the POST body, but first order support would be great. If no one picks this up, I might attempt a PR of first order support early '24.The text was updated successfully, but these errors were encountered: