Skip to content

Conversation

@ilopezluna
Copy link

@ilopezluna ilopezluna commented Oct 3, 2025

Purpose

Add OCI (Open Container Initiative) Registry support to vLLM, enabling models to be loaded directly from container registries like Docker Hub, GitHub Container Registry (ghcr.io), and other OCI-compliant registries.

This contribution is from the Docker Model Runner team. We have developed support for packaging models in Safetensors format as OCI artifacts, enabling efficient distribution and deployment of large language models through existing container registry infrastructure.

Key features:

  • Load models directly from OCI registries using standard OCI reference format ([registry/]repository[:tag|@digest])
  • Support for models packaged with Safetensors layers (application/vnd.docker.ai.safetensors)
  • Support for configuration files packaged as tar layers (application/vnd.docker.ai.vllm.config.tar)
  • Automatic layer caching to avoid redundant downloads
  • Seamless integration with existing vLLM model loading infrastructure
  • No additional dependencies required beyond standard requests library

Test Plan

Unit Tests

Run the test suite for OCI loader:

pytest tests/model_executor/test_oci_loader.py -v

Tests cover:

  • OCI reference normalization (with/without registry, single name, etc.)
  • Cache directory generation
  • Media type constant validation

Manual Testing

Test with a real OCI model (example):

from vllm import LLM, SamplingParams

# Load model from Docker Hub
llm = LLM(
    model="aistaging/smollm2-vllm",
    load_format="oci"
)

# Generate text
prompts = ["Hello, my name is"]
sampling_params = SamplingParams(temperature=0.8, max_tokens=50)
outputs = llm.generate(prompts, sampling_params)
print(outputs[0].outputs[0].text)

E2E

Start vLLM server with OCI model:

vllm serve aistaging/smollm2-vllm --load-format oci --max-num-batched-tokens 8192

Test inference via OpenAI-compatible API:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "aistaging/smollm2-vllm",
    "messages": [
      {"role": "user", "content": "hello!"}
    ],
    "max_tokens": 100
  }'

Response:

{
  "id": "chatcmpl-7b11ff33d05d474fa1a654651bccacd0",
  "object": "chat.completion",
  "created": 1759481936,
  "model": "aistaging/smollm2-vllm",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! What brings you to our chat today?",
      "refusal": null
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 32,
    "total_tokens": 43,
    "completion_tokens": 11
  }
}

Test Result

Unit tests: All tests pass ✅

tests/model_executor/test_oci_loader.py::TestOciModelLoader::test_normalize_oci_reference_with_full_reference PASSED
tests/model_executor/test_oci_loader.py::TestOciModelLoader::test_normalize_oci_reference_without_registry PASSED
tests/model_executor/test_oci_loader.py::TestOciModelLoader::test_normalize_oci_reference_single_name PASSED
tests/model_executor/test_oci_loader.py::TestOciModelLoader::test_get_cache_dir PASSED
tests/model_executor/test_oci_loader.py::TestOciModelLoader::test_media_type_constants PASSED

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@github-actions
Copy link

github-actions bot commented Oct 3, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added the documentation Improvements or additions to documentation label Oct 3, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new OciModelLoader to enable loading models from OCI registries, which is a valuable feature for model distribution. The implementation correctly sets up the new loader and integrates it with the existing infrastructure. However, the core networking logic in OciModelLoader is critically flawed as it contains hardcoded values specific to Docker Hub for both authentication and URL construction. This prevents it from working with other OCI registries like GHCR, which is a key part of the feature's goal. Additionally, there is significant code duplication in the download methods, which impacts maintainability. My review focuses on these critical and high-severity issues that must be addressed for the feature to be robust and compliant with the OCI specification.

Signed-off-by: Ignacio López Luna <[email protected]>
Signed-off-by: Ignacio López Luna <[email protected]>
Signed-off-by: Ignacio López Luna <[email protected]>
Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @22quinn can model loader plugin support this? I think we can maybe start with a plugin first.

@ilopezluna ilopezluna requested a review from Isotr0py October 6, 2025 15:22
Comment on lines +221 to +223
def _pull_oci_manifest(
self, model_ref: str, cache_dir: str
) -> tuple[dict, list[dict], Optional[dict]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, seems that most of functions in OCI loader are implemented for OCI interaction with requests in a quite original way. Do we have packages to provide APIs for convenient registry interaction?

@22quinn
Copy link
Collaborator

22quinn commented Oct 8, 2025

cc @22quinn can model loader plugin support this? I think we can maybe start with a plugin first.

Yes, this looks very doable with an out-of-tree plugin: #21067

@ilopezluna
Copy link
Author

@youkaichao @Isotr0py @22quinn thank you very much for your feedback!
If we implement this as as out of tree plugin would OCI pulling just work when someone installs like this:
pip install vllm ?
The goal is to have pulling models from OCI registries to be as simple as pulling from Hugging Face, install, specify the repo and it just works.

@mergify
Copy link

mergify bot commented Oct 8, 2025

Documentation preview: https://vllm--26160.org.readthedocs.build/en/26160/

@hmellor
Copy link
Member

hmellor commented Oct 8, 2025

If I understand correctly, you would also have to install the plugin separately

@ericcurtin
Copy link
Contributor

ericcurtin commented Oct 8, 2025

If I understand correctly, you would also have to install the plugin separately

Are there ways we can implement this directly in? I mean should we lock-in users to only using huggingface to pull models by default? I mean OCI registries are much more flexible, they are already pre-existing all over the world in terms of infrastructure and you aren't locked in to any specific provider. OCI registries are more community-friendly.

@ericcurtin
Copy link
Contributor

@mudler WDYT?

@mudler
Copy link

mudler commented Oct 8, 2025

It would really be great to have another form of specifying models. Huggingface is a de-facto standard, but it really locks-in the whole community to a specific vendor. OCI images have the benefit that you can just host your own registry and everything works, of course, models authors will have their own preferences, but it's nice to give options to the community for self-hosting models and re-distribution.

Even better if we come up with a standard: everyone will benefit if all the projects implements the same directives. Maybe @ericcurtin this is good food for thought: would be better to have a specific organization, or project in Github to offer the same way to access to OCI models with various languages (via libraries/SDK). That would help to lower the entry barrier for the individual projects to implement this and re-use as much code as possible.

@rhatdan
Copy link

rhatdan commented Oct 14, 2025

I agree we should not lock in to Huggingface, we need to get to support OCI based Images or Artifacts.

@ieaves
Copy link

ieaves commented Oct 15, 2025

From a user perspective it'd be great to have this available without having to install an additional plugin.

@Isotr0py
Copy link
Member

Isotr0py commented Oct 15, 2025

I think the blocker in this PR is the complexity from OCI repo interactions with pure requests. We can simplify this PR a lot by using oras's client and adding it as optional requirement.

@ericcurtin
Copy link
Contributor

ericcurtin commented Oct 15, 2025

I think the blocker in this PR is the complexity from OCI repo interactions with pure requests. We can simplify this PR a lot by using oras's client and adding it as optional requirement.

I think the lowest common denominator for all is:

https://github.com/google/go-containerregistry

@rhatdan please correct me if I'm wrong.

But I think it's key IMO we build this golang directly into vLLM. So we can use private/public OCI registries or huggingface.

@ericcurtin
Copy link
Contributor

I think we should include and shell out to this, it's available for all Linux distros, macOS, all the CPU arches and it gets the job done:

          curl -LO https://github.com/google/go-containerregistry/releases/latest/download/go-containerregistry_Linux_x86_64.tar.gz
          tar -xzvf go-containerregistry_Linux_x86_64.tar.gz crane
          sudo mv crane /usr/local/bin/
          crane version

@ericcurtin
Copy link
Contributor

I think the blocker in this PR is the complexity from OCI repo interactions with pure requests. We can simplify this PR a lot by using oras's client and adding it as optional requirement.

oras doesn't use go-containerregistry so it's a thumbs down from me

@hmellor
Copy link
Member

hmellor commented Oct 20, 2025

From a user perspective it'd be great to have this available without having to install an additional plugin.

This is not an important consideration. Installing another Python package is trivial and a pattern we already use for other alternative model loaders (i.e. Run:ai Model Streamer & CoreWeave Tensorizer).

@ericcurtin
Copy link
Contributor

ericcurtin commented Oct 20, 2025

From a user perspective it'd be great to have this available without having to install an additional plugin.

This is not an important consideration. Installing another Python package is trivial and a pattern we already use for other alternative model loaders (i.e. Run:ai Model Streamer & CoreWeave Tensorizer).

@DarkLight1337 @Isotr0py @youkaichao @22quinn @mudler @rhatdan @ieaves

We have to be careful about HuggingFace engineers rejecting this functionality, there is a clear conflict of interest here. I think many people would happily use existing OCI infrastructure for transporting models.

@hmellor
Copy link
Member

hmellor commented Oct 20, 2025

First, I don't appreciate being singled out in this way. I am a vLLM maintaintainer first, I do what is best for vLLM.

Second, I'm not against adding OCI support to vLLM.

@ericcurtin
Copy link
Contributor

First, I don't appreciate being singled out in this way. I am a vLLM maintaintainer first, I do what is best for vLLM.

Second, I'm not against adding OCI support to vLLM.

Applies to any HuggingFace employee rejecting this feature FWIW

@hmellor
Copy link
Member

hmellor commented Oct 20, 2025

Nobody is rejecting this feature.

return metadata


def is_oci_model_with_tag(model: str) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huggingface safetensors are the most common format for model checkpoint right now. using oci to store model checkpoint is new, but we definitely want to collaborate to make it more popular! we use docker a lot, vLLM ❤️ docker!

as oci format is new, I would expect this part of code to change quite a lot recently, and i think it's not propriate to put all the code directly inside vLLM.

how about this:

  1. the oci load format should be implemented as a model loader plugin, users need to explicitly install it.
  2. from vllm side, we can give explicit error messages to guide users to install the appropiate plugins to run oci model checkpoints.

concrete items:

when we get vllm serve username/model:tag , we try to see if the model is in oci format. and if it is, but specific packages are not installed, then we throw an error message telling users to install the plugin package.

@ilopezluna @ericcurtin thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion, @youkaichao.

Requiring users to explicitly install it basically makes adoption much harder. I understand that the reason for this extra step is that you think adding the pull logic isn’t vLLM’s responsibility — which is fair — but to me, it doesn’t quite justify the additional friction.

I’d prefer an approach where we include the necessary code to handle the pulling (which is quite limited), so users can automatically benefit from OCI registries.

Based on usage, we can later improve it, remove it, or integrate it into the plugin system. But introducing this friction from the very beginning isn’t the best way to roll out a new mechanism for distributing models, in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the friction of adding another package to a requirements.txt is being overstated here. This is what we do for other model loaders such as CoreWeave's Tensorizer and Run:ai's model streamer with no issues.

Having said that, it could eventually become an external plugin/package that is a core requirement (i.e. it is a separate package but it is always installed with vLLM).

As you say, including the pull logic in vLLM is not vLLM's responsibility. For example, there is no pull logic for Hugging Face Hub in vLLM, we use the huggingface-hub client library. Ideally, we would also use an external client library to access OCI Registries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants