[Feature] Add OCI model loader for loading models from OCI registries #26160

ilopezluna · 2025-10-03T09:28:22Z

Purpose

Add OCI (Open Container Initiative) Registry support to vLLM, enabling models to be loaded directly from container registries like Docker Hub, GitHub Container Registry (ghcr.io), and other OCI-compliant registries.

This contribution is from the Docker Model Runner team. We have developed support for packaging models in Safetensors format as OCI artifacts, enabling efficient distribution and deployment of large language models through existing container registry infrastructure.

Key features:

Load models directly from OCI registries using standard OCI reference format ([registry/]repository[:tag|@digest])
Support for models packaged with Safetensors layers (application/vnd.docker.ai.safetensors)
Support for configuration files packaged as tar layers (application/vnd.docker.ai.vllm.config.tar)
Automatic layer caching to avoid redundant downloads
Seamless integration with existing vLLM model loading infrastructure
No additional dependencies required beyond standard requests library

Test Plan

Unit Tests

Run the test suite for OCI loader:

pytest tests/model_executor/test_oci_loader.py -v

Tests cover:

OCI reference normalization (with/without registry, single name, etc.)
Cache directory generation
Media type constant validation

Manual Testing

Test with a real OCI model (example):

from vllm import LLM, SamplingParams

# Load model from Docker Hub
llm = LLM(
    model="aistaging/smollm2-vllm",
    load_format="oci"
)

# Generate text
prompts = ["Hello, my name is"]
sampling_params = SamplingParams(temperature=0.8, max_tokens=50)
outputs = llm.generate(prompts, sampling_params)
print(outputs[0].outputs[0].text)

E2E

Start vLLM server with OCI model:

vllm serve aistaging/smollm2-vllm --load-format oci --max-num-batched-tokens 8192

Test inference via OpenAI-compatible API:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "aistaging/smollm2-vllm",
    "messages": [
      {"role": "user", "content": "hello!"}
    ],
    "max_tokens": 100
  }'

Response:

{
  "id": "chatcmpl-7b11ff33d05d474fa1a654651bccacd0",
  "object": "chat.completion",
  "created": 1759481936,
  "model": "aistaging/smollm2-vllm",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! What brings you to our chat today?",
      "refusal": null
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 32,
    "total_tokens": 43,
    "completion_tokens": 11
  }
}

Test Result

Unit tests: All tests pass ✅

tests/model_executor/test_oci_loader.py::TestOciModelLoader::test_normalize_oci_reference_with_full_reference PASSED
tests/model_executor/test_oci_loader.py::TestOciModelLoader::test_normalize_oci_reference_without_registry PASSED
tests/model_executor/test_oci_loader.py::TestOciModelLoader::test_normalize_oci_reference_single_name PASSED
tests/model_executor/test_oci_loader.py::TestOciModelLoader::test_get_cache_dir PASSED
tests/model_executor/test_oci_loader.py::TestOciModelLoader::test_media_type_constants PASSED

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-10-03T09:28:33Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Signed-off-by: Ignacio López Luna <[email protected]>

gemini-code-assist

Code Review

This pull request introduces a new OciModelLoader to enable loading models from OCI registries, which is a valuable feature for model distribution. The implementation correctly sets up the new loader and integrates it with the existing infrastructure. However, the core networking logic in OciModelLoader is critically flawed as it contains hardcoded values specific to Docker Hub for both authentication and URL construction. This prevents it from working with other OCI registries like GHCR, which is a key part of the feature's goal. Additionally, there is significant code duplication in the download methods, which impacts maintainability. My review focuses on these critical and high-severity issues that must be addressed for the feature to be robust and compliant with the OCI specification.

vllm/model_executor/model_loader/oci_loader.py

Signed-off-by: Ignacio López Luna <[email protected]>

… in OCI loader Signed-off-by: Ignacio López Luna <[email protected]>

tests/model_executor/model_loader/test_oci_loader.py

Signed-off-by: Ignacio López Luna <[email protected]>

vllm/transformers_utils/config.py

examples/oci_model_example.py

…loads Signed-off-by: Ignacio López Luna <[email protected]>

Signed-off-by: Ignacio López Luna <[email protected]>

youkaichao

cc @22quinn can model loader plugin support this? I think we can maybe start with a plugin first.

…onfig Signed-off-by: Ignacio López Luna <[email protected]>

Isotr0py · 2025-10-07T16:59:07Z

vllm/model_executor/model_loader/oci_loader.py

+    def _pull_oci_manifest(
+        self, model_ref: str, cache_dir: str
+    ) -> tuple[dict, list[dict], Optional[dict]]:


Hmmm, seems that most of functions in OCI loader are implemented for OCI interaction with requests in a quite original way. Do we have packages to provide APIs for convenient registry interaction?

Signed-off-by: Ignacio López Luna <[email protected]>

22quinn · 2025-10-08T08:25:47Z

cc @22quinn can model loader plugin support this? I think we can maybe start with a plugin first.

Yes, this looks very doable with an out-of-tree plugin: #21067

ilopezluna · 2025-10-08T12:05:02Z

@youkaichao @Isotr0py @22quinn thank you very much for your feedback!
If we implement this as as out of tree plugin would OCI pulling just work when someone installs like this:
pip install vllm ?
The goal is to have pulling models from OCI registries to be as simple as pulling from Hugging Face, install, specify the repo and it just works.

mergify · 2025-10-08T12:05:44Z

Documentation preview: https://vllm--26160.org.readthedocs.build/en/26160/

hmellor · 2025-10-08T13:39:07Z

If I understand correctly, you would also have to install the plugin separately

ericcurtin · 2025-10-08T15:04:59Z

If I understand correctly, you would also have to install the plugin separately

Are there ways we can implement this directly in? I mean should we lock-in users to only using huggingface to pull models by default? I mean OCI registries are much more flexible, they are already pre-existing all over the world in terms of infrastructure and you aren't locked in to any specific provider. OCI registries are more community-friendly.

ericcurtin · 2025-10-08T15:16:00Z

@mudler WDYT?

mudler · 2025-10-08T17:11:38Z

It would really be great to have another form of specifying models. Huggingface is a de-facto standard, but it really locks-in the whole community to a specific vendor. OCI images have the benefit that you can just host your own registry and everything works, of course, models authors will have their own preferences, but it's nice to give options to the community for self-hosting models and re-distribution.

Even better if we come up with a standard: everyone will benefit if all the projects implements the same directives. Maybe @ericcurtin this is good food for thought: would be better to have a specific organization, or project in Github to offer the same way to access to OCI models with various languages (via libraries/SDK). That would help to lower the entry barrier for the individual projects to implement this and re-use as much code as possible.

rhatdan · 2025-10-14T07:55:55Z

I agree we should not lock in to Huggingface, we need to get to support OCI based Images or Artifacts.

ieaves · 2025-10-15T14:36:49Z

From a user perspective it'd be great to have this available without having to install an additional plugin.

Isotr0py · 2025-10-15T14:42:02Z

I think the blocker in this PR is the complexity from OCI repo interactions with pure requests. We can simplify this PR a lot by using oras's client and adding it as optional requirement.

ericcurtin · 2025-10-15T15:36:26Z

I think the blocker in this PR is the complexity from OCI repo interactions with pure requests. We can simplify this PR a lot by using oras's client and adding it as optional requirement.

I think the lowest common denominator for all is:

https://github.com/google/go-containerregistry

@rhatdan please correct me if I'm wrong.

But I think it's key IMO we build this golang directly into vLLM. So we can use private/public OCI registries or huggingface.

ericcurtin · 2025-10-15T20:35:20Z

I think we should include and shell out to this, it's available for all Linux distros, macOS, all the CPU arches and it gets the job done:

          curl -LO https://github.com/google/go-containerregistry/releases/latest/download/go-containerregistry_Linux_x86_64.tar.gz
          tar -xzvf go-containerregistry_Linux_x86_64.tar.gz crane
          sudo mv crane /usr/local/bin/
          crane version

ericcurtin · 2025-10-15T20:36:37Z

I think the blocker in this PR is the complexity from OCI repo interactions with pure requests. We can simplify this PR a lot by using oras's client and adding it as optional requirement.

oras doesn't use go-containerregistry so it's a thumbs down from me

hmellor · 2025-10-20T12:13:31Z

From a user perspective it'd be great to have this available without having to install an additional plugin.

This is not an important consideration. Installing another Python package is trivial and a pattern we already use for other alternative model loaders (i.e. Run:ai Model Streamer & CoreWeave Tensorizer).

ericcurtin · 2025-10-20T12:21:37Z

From a user perspective it'd be great to have this available without having to install an additional plugin.

This is not an important consideration. Installing another Python package is trivial and a pattern we already use for other alternative model loaders (i.e. Run:ai Model Streamer & CoreWeave Tensorizer).

@DarkLight1337 @Isotr0py @youkaichao @22quinn @mudler @rhatdan @ieaves

We have to be careful about HuggingFace engineers rejecting this functionality, there is a clear conflict of interest here. I think many people would happily use existing OCI infrastructure for transporting models.

hmellor · 2025-10-20T12:48:03Z

First, I don't appreciate being singled out in this way. I am a vLLM maintaintainer first, I do what is best for vLLM.

Second, I'm not against adding OCI support to vLLM.

ericcurtin · 2025-10-20T12:56:54Z

First, I don't appreciate being singled out in this way. I am a vLLM maintaintainer first, I do what is best for vLLM.

Second, I'm not against adding OCI support to vLLM.

Applies to any HuggingFace employee rejecting this feature FWIW

hmellor · 2025-10-20T14:03:31Z

Nobody is rejecting this feature.

youkaichao · 2025-10-24T04:53:07Z

vllm/transformers_utils/utils.py

        return metadata
+
+
+def is_oci_model_with_tag(model: str) -> bool:


huggingface safetensors are the most common format for model checkpoint right now. using oci to store model checkpoint is new, but we definitely want to collaborate to make it more popular! we use docker a lot, vLLM ❤️ docker!

as oci format is new, I would expect this part of code to change quite a lot recently, and i think it's not propriate to put all the code directly inside vLLM.

how about this:

the oci load format should be implemented as a model loader plugin, users need to explicitly install it.

from vllm side, we can give explicit error messages to guide users to install the appropiate plugins to run oci model checkpoints.

concrete items:

when we get vllm serve username/model:tag , we try to see if the model is in oci format. and if it is, but specific packages are not installed, then we throw an error message telling users to install the plugin package.

@ilopezluna @ericcurtin thoughts?

Thanks for your suggestion, @youkaichao.

Requiring users to explicitly install it basically makes adoption much harder. I understand that the reason for this extra step is that you think adding the pull logic isn’t vLLM’s responsibility — which is fair — but to me, it doesn’t quite justify the additional friction.

I’d prefer an approach where we include the necessary code to handle the pulling (which is quite limited), so users can automatically benefit from OCI registries.

Based on usage, we can later improve it, remove it, or integrate it into the plugin system. But introducing this friction from the very beginning isn’t the best way to roll out a new mechanism for distributing models, in my opinion.

I think the friction of adding another package to a requirements.txt is being overstated here. This is what we do for other model loaders such as CoreWeave's Tensorizer and Run:ai's model streamer with no issues.

Having said that, it could eventually become an external plugin/package that is a core requirement (i.e. it is a separate package but it is always installed with vLLM).

As you say, including the pull logic in vLLM is not vLLM's responsibility. For example, there is no pull logic for Hugging Face Hub in vLLM, we use the huggingface-hub client library. Ideally, we would also use an external client library to access OCI Registries.

ilopezluna requested review from 22quinn, ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners October 3, 2025 09:28

mergify bot added the documentation Improvements or additions to documentation label Oct 3, 2025

[Feature] Add OCI model loader for loading models from OCI registries

e5b2bb5

Signed-off-by: Ignacio López Luna <[email protected]>

ilopezluna force-pushed the support-oci-models branch from 0a9d58d to e5b2bb5 Compare October 3, 2025 09:30

gemini-code-assist bot reviewed Oct 3, 2025

View reviewed changes

vllm/model_executor/model_loader/oci_loader.py Outdated Show resolved Hide resolved

vllm/model_executor/model_loader/oci_loader.py Outdated Show resolved Hide resolved

ilopezluna added 4 commits October 3, 2025 11:57

Fix linting errors for OCI loader

da61982

Signed-off-by: Ignacio López Luna <[email protected]>

Enhance OCI model loading

3b670f7

Signed-off-by: Ignacio López Luna <[email protected]>

update documentation

91fe3af

Signed-off-by: Ignacio López Luna <[email protected]>

[Feature] Implement authenticated requests and registry normalization…

0beb50d

… in OCI loader Signed-off-by: Ignacio López Luna <[email protected]>

DarkLight1337 reviewed Oct 3, 2025

View reviewed changes

tests/model_executor/model_loader/test_oci_loader.py Show resolved Hide resolved

ilopezluna added 3 commits October 3, 2025 13:29

Move test under tests/model_executor/model_loader

d5550d3

Signed-off-by: Ignacio López Luna <[email protected]>

Merge branch 'main' into support-oci-models

3567c0c

Signed-off-by: Ignacio López Luna <[email protected]>

[Refactor] Clean up assertions and formatting in OCI loader tests

f0992ab

Signed-off-by: Ignacio López Luna <[email protected]>

ilopezluna requested a review from DarkLight1337 October 6, 2025 09:28

Isotr0py reviewed Oct 6, 2025

View reviewed changes

vllm/transformers_utils/config.py Outdated Show resolved Hide resolved

examples/oci_model_example.py Show resolved Hide resolved

ilopezluna added 2 commits October 6, 2025 13:22

[Feature] Enhance OCI model downloading with configurable weight down…

1601a3b

…loads Signed-off-by: Ignacio López Luna <[email protected]>

[Feature] Implement automatic OCI format detection for model loading

d4414a6

Signed-off-by: Ignacio López Luna <[email protected]>

ilopezluna requested a review from Isotr0py October 6, 2025 11:44

ericcurtin mentioned this pull request Oct 6, 2025

Feat : vLLM - adds object , adds tests , container , init , dataclasses testcontainers/testcontainers-python#886

Open

[Feature] Move OCI model tag detection to utils and update tests

b3d2cba

Signed-off-by: Ignacio López Luna <[email protected]>

youkaichao reviewed Oct 6, 2025

View reviewed changes

ilopezluna added 2 commits October 6, 2025 15:49

[Feature] Implement OCI model auto-detection and downloading in LoadC…

15e2c75

…onfig Signed-off-by: Ignacio López Luna <[email protected]>

Merge branch 'main' into support-oci-models

e4ec9e4

ilopezluna requested a review from Isotr0py October 6, 2025 15:22

Isotr0py reviewed Oct 7, 2025

View reviewed changes

ilopezluna added 2 commits October 7, 2025 19:20

[Refactor] Simplify weight loading logic in OCI loader

9743fab

Signed-off-by: Ignacio López Luna <[email protected]>

Merge branch 'main' into support-oci-models

37d510c

ilopezluna added 2 commits October 9, 2025 14:15

Merge branch 'main' into support-oci-models

3c70a4a

Merge branch 'main' into support-oci-models

e3524fc

ericcurtin mentioned this pull request Oct 19, 2025

Integrate go containerregistry library #27172

Open

youkaichao reviewed Oct 24, 2025

View reviewed changes

		return metadata


		def is_oci_model_with_tag(model: str) -> bool:

Uh oh!

[Feature] Add OCI model loader for loading models from OCI registries #26160

Are you sure you want to change the base?

[Feature] Add OCI model loader for loading models from OCI registries #26160

Uh oh!

Conversation

ilopezluna commented Oct 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Unit Tests

Manual Testing

E2E

Test Result

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

22quinn commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilopezluna commented Oct 8, 2025

Uh oh!

mergify bot commented Oct 8, 2025

Uh oh!

hmellor commented Oct 8, 2025

Uh oh!

ericcurtin commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericcurtin commented Oct 8, 2025

Uh oh!

mudler commented Oct 8, 2025

Uh oh!

rhatdan commented Oct 14, 2025

Uh oh!

ieaves commented Oct 15, 2025

Uh oh!

Isotr0py commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericcurtin commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericcurtin commented Oct 15, 2025

Uh oh!

ericcurtin commented Oct 15, 2025

Uh oh!

hmellor commented Oct 20, 2025

Uh oh!

ericcurtin commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmellor commented Oct 20, 2025

Uh oh!

ericcurtin commented Oct 20, 2025

Uh oh!

hmellor commented Oct 20, 2025

Uh oh!

youkaichao Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

ilopezluna Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor Oct 25, 2025

ilopezluna commented Oct 3, 2025 •

edited by github-actions bot

Loading

22quinn commented Oct 8, 2025 •

edited

Loading

ericcurtin commented Oct 8, 2025 •

edited

Loading

Isotr0py commented Oct 15, 2025 •

edited

Loading

ericcurtin commented Oct 15, 2025 •

edited

Loading

ericcurtin commented Oct 20, 2025 •

edited

Loading