-
Notifications
You must be signed in to change notification settings - Fork 163
vLLM backend for Cortex #1890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A "Python engine" actually has a larger scope than the current llama.cpp engine:
Therefore, having a Python engine means we have to solve the 2 following problems
Dependency managementuv is the new cool kid in town. Not only it's reportedly more robust than pip+venv, it's fast and provides many convenience methods. Two interesting "modes"
How to expose Python engine
For 2a, I foresee a potential problem: how we will configure/propagate the function signature / API contract from Python to Case study - Current Python engine
Extension pointUsing uv, apart from serving a model, we can also use it to run Python CLI programs e.g. https://github.com/janhq/robobench |
Managing dependencies and running scripts with uv is a great idea. We'll need to figure out if this is possible to have this happen from within C++ (they explicitly don't expose classes in Python to import uv as a normal library but we could double-check this). In the "How to Expose the Python Engine" section, we could create a tight integration between the two languages by exposing Python objects and tools as C++ ones and vice-versa using packages like pybind11 and scikit-build-core. This could potentially address the 2a concern on how to propagate signature functions like As @gau-nernst mentioned, running Python processes as separate HTTP servers and having the Some thoughts on a potential action plan to test this:
Note: I will update this as I give it more thought |
Is it possible to have sandboxed environments? Marimo seems to have done it with https://github.com/marimo-team/marimo/releases/tag/0.8.4 |
It's just having a separate virtual env to avoid package conflicts (the current Python PR is ready doing this). It's not sandbox in the security sense (Python script shouldn't be allowed to access host files, internet...) |
As discussed previously, to limit the scope of this feature, we won't expose Python engine directly but only specific applications. For the current milestone, we aim to add vLLM as an alternative backend for Cortex. |
At the moment, the process by which a user can build a custom python engine to deploy a model via cortex is not straightforward in code or clear in the docs. The plan is:
model.yml
configGoals
Tasks
Obstacle
Out of scope
The text was updated successfully, but these errors were encountered: