-
-
Notifications
You must be signed in to change notification settings - Fork 12.2k
[Feature] Add OCI model loader for loading models from OCI registries #26160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
Signed-off-by: Ignacio López Luna <[email protected]>
0a9d58d to
e5b2bb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new OciModelLoader to enable loading models from OCI registries, which is a valuable feature for model distribution. The implementation correctly sets up the new loader and integrates it with the existing infrastructure. However, the core networking logic in OciModelLoader is critically flawed as it contains hardcoded values specific to Docker Hub for both authentication and URL construction. This prevents it from working with other OCI registries like GHCR, which is a key part of the feature's goal. Additionally, there is significant code duplication in the download methods, which impacts maintainability. My review focuses on these critical and high-severity issues that must be addressed for the feature to be robust and compliant with the OCI specification.
Signed-off-by: Ignacio López Luna <[email protected]>
Signed-off-by: Ignacio López Luna <[email protected]>
Signed-off-by: Ignacio López Luna <[email protected]>
… in OCI loader Signed-off-by: Ignacio López Luna <[email protected]>
Signed-off-by: Ignacio López Luna <[email protected]>
Signed-off-by: Ignacio López Luna <[email protected]>
Signed-off-by: Ignacio López Luna <[email protected]>
…loads Signed-off-by: Ignacio López Luna <[email protected]>
Signed-off-by: Ignacio López Luna <[email protected]>
Signed-off-by: Ignacio López Luna <[email protected]>
youkaichao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @22quinn can model loader plugin support this? I think we can maybe start with a plugin first.
…onfig Signed-off-by: Ignacio López Luna <[email protected]>
| def _pull_oci_manifest( | ||
| self, model_ref: str, cache_dir: str | ||
| ) -> tuple[dict, list[dict], Optional[dict]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, seems that most of functions in OCI loader are implemented for OCI interaction with requests in a quite original way. Do we have packages to provide APIs for convenient registry interaction?
Signed-off-by: Ignacio López Luna <[email protected]>
|
@youkaichao @Isotr0py @22quinn thank you very much for your feedback! |
|
Documentation preview: https://vllm--26160.org.readthedocs.build/en/26160/ |
|
If I understand correctly, you would also have to install the plugin separately |
Are there ways we can implement this directly in? I mean should we lock-in users to only using huggingface to pull models by default? I mean OCI registries are much more flexible, they are already pre-existing all over the world in terms of infrastructure and you aren't locked in to any specific provider. OCI registries are more community-friendly. |
|
@mudler WDYT? |
|
It would really be great to have another form of specifying models. Huggingface is a de-facto standard, but it really locks-in the whole community to a specific vendor. OCI images have the benefit that you can just host your own registry and everything works, of course, models authors will have their own preferences, but it's nice to give options to the community for self-hosting models and re-distribution. Even better if we come up with a standard: everyone will benefit if all the projects implements the same directives. Maybe @ericcurtin this is good food for thought: would be better to have a specific organization, or project in Github to offer the same way to access to OCI models with various languages (via libraries/SDK). That would help to lower the entry barrier for the individual projects to implement this and re-use as much code as possible. |
|
I agree we should not lock in to Huggingface, we need to get to support OCI based Images or Artifacts. |
|
From a user perspective it'd be great to have this available without having to install an additional plugin. |
|
I think the blocker in this PR is the complexity from OCI repo interactions with pure |
I think the lowest common denominator for all is: https://github.com/google/go-containerregistry @rhatdan please correct me if I'm wrong. But I think it's key IMO we build this golang directly into vLLM. So we can use private/public OCI registries or huggingface. |
|
I think we should include and shell out to this, it's available for all Linux distros, macOS, all the CPU arches and it gets the job done: |
oras doesn't use go-containerregistry so it's a thumbs down from me |
This is not an important consideration. Installing another Python package is trivial and a pattern we already use for other alternative model loaders (i.e. Run:ai Model Streamer & CoreWeave Tensorizer). |
@DarkLight1337 @Isotr0py @youkaichao @22quinn @mudler @rhatdan @ieaves We have to be careful about HuggingFace engineers rejecting this functionality, there is a clear conflict of interest here. I think many people would happily use existing OCI infrastructure for transporting models. |
|
First, I don't appreciate being singled out in this way. I am a vLLM maintaintainer first, I do what is best for vLLM. Second, I'm not against adding OCI support to vLLM. |
Applies to any HuggingFace employee rejecting this feature FWIW |
|
Nobody is rejecting this feature. |
| return metadata | ||
|
|
||
|
|
||
| def is_oci_model_with_tag(model: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huggingface safetensors are the most common format for model checkpoint right now. using oci to store model checkpoint is new, but we definitely want to collaborate to make it more popular! we use docker a lot, vLLM ❤️ docker!
as oci format is new, I would expect this part of code to change quite a lot recently, and i think it's not propriate to put all the code directly inside vLLM.
how about this:
- the oci load format should be implemented as a model loader plugin, users need to explicitly install it.
- from vllm side, we can give explicit error messages to guide users to install the appropiate plugins to run oci model checkpoints.
concrete items:
when we get vllm serve username/model:tag , we try to see if the model is in oci format. and if it is, but specific packages are not installed, then we throw an error message telling users to install the plugin package.
@ilopezluna @ericcurtin thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion, @youkaichao.
Requiring users to explicitly install it basically makes adoption much harder. I understand that the reason for this extra step is that you think adding the pull logic isn’t vLLM’s responsibility — which is fair — but to me, it doesn’t quite justify the additional friction.
I’d prefer an approach where we include the necessary code to handle the pulling (which is quite limited), so users can automatically benefit from OCI registries.
Based on usage, we can later improve it, remove it, or integrate it into the plugin system. But introducing this friction from the very beginning isn’t the best way to roll out a new mechanism for distributing models, in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the friction of adding another package to a requirements.txt is being overstated here. This is what we do for other model loaders such as CoreWeave's Tensorizer and Run:ai's model streamer with no issues.
Having said that, it could eventually become an external plugin/package that is a core requirement (i.e. it is a separate package but it is always installed with vLLM).
As you say, including the pull logic in vLLM is not vLLM's responsibility. For example, there is no pull logic for Hugging Face Hub in vLLM, we use the huggingface-hub client library. Ideally, we would also use an external client library to access OCI Registries.
Purpose
Add OCI (Open Container Initiative) Registry support to vLLM, enabling models to be loaded directly from container registries like Docker Hub, GitHub Container Registry (ghcr.io), and other OCI-compliant registries.
This contribution is from the Docker Model Runner team. We have developed support for packaging models in Safetensors format as OCI artifacts, enabling efficient distribution and deployment of large language models through existing container registry infrastructure.
Key features:
[registry/]repository[:tag|@digest])application/vnd.docker.ai.safetensors)application/vnd.docker.ai.vllm.config.tar)requestslibraryTest Plan
Unit Tests
Run the test suite for OCI loader:
Tests cover:
Manual Testing
Test with a real OCI model (example):
E2E
Start vLLM server with OCI model:
Test inference via OpenAI-compatible API:
Response:
{ "id": "chatcmpl-7b11ff33d05d474fa1a654651bccacd0", "object": "chat.completion", "created": 1759481936, "model": "aistaging/smollm2-vllm", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "Hello! What brings you to our chat today?", "refusal": null }, "finish_reason": "stop" }], "usage": { "prompt_tokens": 32, "total_tokens": 43, "completion_tokens": 11 } }Test Result
Unit tests: All tests pass ✅
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.