[Serve.llm] The LLM serve apis don't work on some VLMs like OpenGVLab/InternVL2_5-1B-MPO #52594

kouroshHakha · 2025-04-25T00:24:31Z

What happened + What you expected to happen

There is some problem with how the chat template is resolved for vision language models when you compare vllm's openai server vs. LLM apis.

The issue is rooted in some ad-hoc ops that are happening in openAI server before passing prompts to the engine that might be missing from the LLM engine request submission path on ray serve llm.

Versions / Dependencies

N/A

Reproduction script

Compare the following vllm cmd with the corresponding serve llm deployment:

VLLM Code

vllm serve OpenGVLab/InternVL2_5-1B-MPO --dtype half --max-model-len 32768 --tensor-parallel-size 1 --port 8000

SERVE LLM CODE

from typing_extensions import runtime
from ray import serve
from ray.serve.llm import LLMConfig, build_openai_app

llm_config = LLMConfig(
    model_loading_config=dict(
        model_id="OpenGVLab/InternVL2_5-1B-MPO",
    ),
    deployment_config=dict(
        autoscaling_config=dict(
            min_replicas=1, max_replicas=2,
        )
    ),
    runtime_env={
        "env_vars": {
            "VLLM_USE_V1": "1"
        }
    },
    # You can customize the engine arguments (e.g. vLLM engine kwargs)
    engine_kwargs=dict(
        tensor_parallel_size=1,
        max_model_len=32768,
        dtype="half",
        trust_remote_code=True,
    ),
)

app = build_openai_app({"llm_configs": [llm_config]})
serve.run(app, blocking=True)

from openai import OpenAI

client = OpenAI(
    # Replace the URL
    base_url="http://localhost:8000/v1",
    api_key="NOT A REAL KEY",
)


chat_response = client.chat.completions.create(
    model="OpenGVLab/InternVL2_5-1B-MPO",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"
                    },
                },
                {"type": "text", "text": "What is the text in the illustrate?"},
            ],
        },
    ],
    max_tokens=10,
)
if hasattr(chat_response, "choices"):
    print(chat_response.choices[0].message.content)

For example for this particular model here is the diff in the conversation that tokenizer.apply_chat_template() gets applied to:

on vllm:

conversation=[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': '<image>\nWhat is the text in the illustrate?'}]

Image is replaced with tag. This is done in the openAI server logic and then passed into the tokenizer's chat_template

on serve llm:

[Message(role='system', content='You are a helpful assistant.'), Message(role='user', content=[Image(field='image_url', image_url={'url': 'https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png'}), Text(field='text', type='text', text='What is the text in the illustrate?')])]

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

kouroshHakha added bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order labels Apr 25, 2025

kouroshHakha self-assigned this Apr 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve.llm] The LLM serve apis don't work on some VLMs like OpenGVLab/InternVL2_5-1B-MPO #52594

[Serve.llm] The LLM serve apis don't work on some VLMs like OpenGVLab/InternVL2_5-1B-MPO #52594

kouroshHakha commented Apr 25, 2025

[Serve.llm] The LLM serve apis don't work on some VLMs like OpenGVLab/InternVL2_5-1B-MPO #52594

[Serve.llm] The LLM serve apis don't work on some VLMs like OpenGVLab/InternVL2_5-1B-MPO #52594

Comments

kouroshHakha commented Apr 25, 2025

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity