Skip to content

Conversation

@chaunceyjiang
Copy link
Collaborator

@chaunceyjiang chaunceyjiang commented Apr 3, 2025

Add smolvlm support

FIX #15541

@github-actions
Copy link

github-actions bot commented Apr 3, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation frontend v1 labels Apr 3, 2025
@chaunceyjiang chaunceyjiang force-pushed the smolvlm branch 3 times, most recently from bc8df55 to 88f116b Compare April 3, 2025 15:38
@chaunceyjiang
Copy link
Collaborator Author

test

vllm serve HuggingFaceTB/SmolVLM2-2.2B-Instruct --limit-mm-per-prompt image=4

multi-image

# python examples/online_serving/openai_chat_completion_client_for_multimodal.py --chat-type multi-image 
INFO 04-03 13:37:43 [__init__.py:239] Automatically detected platform cuda.
Chat completion output:  In the center of this image, the majestic lion commands attention. Its fur, a rich, full-grown orange, is bathed in the warm glow of the sun, reflecting off the tall grass it stands in. The lion's black and brown mane, full and well-groomed, cascades over

text-only

#  python examples/online_serving/openai_chat_completion_client_for_multimodal.py --chat-type text-only
INFO 04-03 15:41:36 [__init__.py:239] Automatically detected platform cuda.
Chat completion output:  The capital of France is Paris. It is the country's largest city, cultural center, and a global hub for fashion, gastronomy, and art. Paris is located in northern France on the Seine River and is renowned globally for its beautiful architecture, historical landmarks, and iconic landmarks such as the Eiffel Tower, Notre

single-image

# python examples/online_serving/openai_chat_completion_client_for_multimodal.py   
INFO 04-03 13:30:35 [__init__.py:239] Automatically detected platform cuda.
Chat completion output from image url:  This image shows a vibrant scene of a wooden boardwalk path that extends through a lush, green grassy field. The boardwalk, made of wooden planks, is prominently displayed in the foreground, inviting viewers to imagine themselves walking along its length. The grassy field underfoot is a vibrant green, dotted with occasional wildflowers that add
Chat completion output from base64 encoded image:  This image captures the serene beauty of a wooden boardwalk that cuts through a verdant, sloping field adorned with vibrant green grass. The boardwalk, constructed from wooden planks, gently curves from the bottom left to the center of the image, inviting viewers to step into this pastoral scene. The grass, lush and ver

@chaunceyjiang chaunceyjiang marked this pull request as ready for review April 4, 2025 04:33
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Apr 8, 2025
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise looks good, assuming that you have run the example scripts already

@chaunceyjiang
Copy link
Collaborator Author

test
image
image

auto-merge was automatically disabled April 8, 2025 09:48

Head branch was pushed to by a user without write access

Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
@chaunceyjiang
Copy link
Collaborator Author

test

python examples/offline_inference/vision_language.py --model-type smolvlm
...
INFO 04-08 10:08:51 [kv_cache_utils.py:577] GPU KV cache size: 353,952 tokens
INFO 04-08 10:08:51 [kv_cache_utils.py:580] Maximum concurrency for 8,192 tokens per request: 43.21x
DEBUG 04-08 10:08:52 [core_client.py:421] Waiting for 1 core engine proc(s) to start: {0}
INFO 04-08 10:08:55 [core.py:162] init engine (profile, create kv cache, warmup model) took 5.39 seconds
DEBUG 04-08 10:08:55 [core.py:410] EngineCore waiting for work.
INFO 04-08 10:08:55 [core_client.py:435] Core engine process 0 ready.
DEBUG 04-08 10:08:55 [decorators.py:109] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.llama.LlamaModel'>: ['input_ids', 'positions', 'intermediate_tensors', 'inputs_embeds']
DEBUG 04-08 10:08:58 [core.py:416] EngineCore loop active - local unfinished: True, finished: False.
Processed prompts:  75%|██████████████████████████████████████████              | 3/4 [00:01<00:00,  2.47it/s, est. speed input: 2553.39 toks/s, output: 148.69 toks/s]DEBUG 04-08 10:08:59 [core.py:410] EngineCore waiting for work.
Processed prompts: 100%|████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  3.03it/s, est. speed input: 3327.80 toks/s, output: 193.79 toks/s]
 The image captures the iconic Tokyo Tower, a renowned landmark in Japan, standing tall against the backdrop of a clear blue sky. The tower, painted in a pristine white, is adorned with a lattice structure that adds an element of architectural interest. The perspective of the image is particularly striking, as it is taken from a low
 The image captures a breathtaking view of the Tokyo Skytree, the tallest structure in Japan, standing tall against the backdrop of a clear blue sky. The Skytree, a modern marvel of engineering, is adorned with a white lattice structure that adds a unique aesthetic appeal to its towering presence. The perspective of the image is particularly
 The image captures a breathtaking view of the Tokyo Tower, a renowned landmark in Japan. The tower, painted in a pristine white, stands tall against the backdrop of a clear blue sky. Its unique lattice structure adds an architectural marvel to the scene.

In the foreground, cherry blossoms are in full bloom, their delicate
 The image captures a breathtaking view of the Tokyo Skytree, the tallest structure in Japan, standing majestically against a backdrop of a clear blue sky. The Skytree, a modern marvel of architecture, is adorned with a white dome at its center, surrounded by a lattice of steel beams that add a touch of industrial
DEBUG 04-08 10:08:59 [core.py:382] EngineCore interrupted.

Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
@chaunceyjiang
Copy link
Collaborator Author

test

python examples/offline_inference/vision_language.py --model-type smolvlm
...
DEBUG 04-08 11:18:46 [core.py:410] EngineCore waiting for work.
Processed prompts: 100%|████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  3.14it/s, est. speed input: 3449.77 toks/s, output: 200.89 toks/s]
 The image captures a breathtaking view of the Tokyo Skytree, the tallest structure in Japan, standing tall against the backdrop of a clear blue sky. The Skytree, a modern marvel of engineering, is a towering structure with a unique spiral design. It's a testament to human ingenuity and the beauty of architecture.


 The image captures a breathtaking view of the Tokyo Skytree, the tallest structure in Japan, standing majestically against the backdrop of a clear blue sky. The Skytree, a modern marvel of architecture, is adorned with a lattice structure that adds a unique charm to its appearance. The perspective of the image is from below
 The image captures a breathtaking view of the Tokyo Skytree, the tallest structure in Japan, standing tall against the backdrop of a clear blue sky. The Skytree, a white tower with a unique spiral design, is prominently featured in the center of the image. It's surrounded by a sea of pink cherry blossoms, their
 The image captures a breathtaking view of the Tokyo Tower, a renowned landmark in Japan. The tower, painted in white, stands tall against the backdrop of a clear blue sky. It's surrounded by a sea of pink cherry blossoms, their delicate petals adding a touch of softness to the scene. The perspective of the image

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) April 8, 2025 11:22
@chaunceyjiang
Copy link
Collaborator Author

ImportError: Package `num2words` is required to run SmolVLM processor. Install it with `pip install num2words`.
--
  | [2025-04-08T12:21:46Z] FAILED models/multimodal/processing/test_smolvlm.py::test_processor_override[True-1-mm_processor_kwargs1-845-HuggingFaceTB/SmolVLM2-2.2B-Instruct] - ImportError: Package `num2words` is required to run SmolVLM processor. Install it with `pip install num2words`.

Hi, @DarkLight1337 Should the num2words dependency be added to common.txt?

Signed-off-by: chaunceyjiang <[email protected]>
auto-merge was automatically disabled April 8, 2025 14:34

Head branch was pushed to by a user without write access

Signed-off-by: chaunceyjiang <[email protected]>
@mergify mergify bot added the ci/build label Apr 8, 2025
@DarkLight1337
Copy link
Member

ImportError: Package `num2words` is required to run SmolVLM processor. Install it with `pip install num2words`.
--
  | [2025-04-08T12:21:46Z] FAILED models/multimodal/processing/test_smolvlm.py::test_processor_override[True-1-mm_processor_kwargs1-845-HuggingFaceTB/SmolVLM2-2.2B-Instruct] - ImportError: Package `num2words` is required to run SmolVLM processor. Install it with `pip install num2words`.

Hi, @DarkLight1337 Should the num2words dependency be added to common.txt?

Let's just add it to the test requirements, not common

Signed-off-by: chaunceyjiang <[email protected]>
@chaunceyjiang
Copy link
Collaborator Author

@DarkLight1337 The e2e test failures seem unrelated to my code.
The tests I added have already passed.

@vllm-bot vllm-bot merged commit 102bf96 into vllm-project:main Apr 9, 2025
64 of 67 checks passed
@chaunceyjiang chaunceyjiang deleted the smolvlm branch April 9, 2025 02:14
yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: Mu Huai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[New Model]: HuggingFaceTB/SmolVLM2-2.2B-Instruct

3 participants