Skip to content

[Serve.llm] The LLM serve apis don't work on some VLMs like OpenGVLab/InternVL2_5-1B-MPO #52594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kouroshHakha opened this issue Apr 25, 2025 · 0 comments
Assignees
Labels
bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order

Comments

@kouroshHakha
Copy link
Contributor

What happened + What you expected to happen

There is some problem with how the chat template is resolved for vision language models when you compare vllm's openai server vs. LLM apis.

The issue is rooted in some ad-hoc ops that are happening in openAI server before passing prompts to the engine that might be missing from the LLM engine request submission path on ray serve llm.

Versions / Dependencies

N/A

Reproduction script

Compare the following vllm cmd with the corresponding serve llm deployment:

VLLM Code

vllm serve OpenGVLab/InternVL2_5-1B-MPO --dtype half --max-model-len 32768 --tensor-parallel-size 1 --port 8000

SERVE LLM CODE

from typing_extensions import runtime
from ray import serve
from ray.serve.llm import LLMConfig, build_openai_app

llm_config = LLMConfig(
    model_loading_config=dict(
        model_id="OpenGVLab/InternVL2_5-1B-MPO",
    ),
    deployment_config=dict(
        autoscaling_config=dict(
            min_replicas=1, max_replicas=2,
        )
    ),
    runtime_env={
        "env_vars": {
            "VLLM_USE_V1": "1"
        }
    },
    # You can customize the engine arguments (e.g. vLLM engine kwargs)
    engine_kwargs=dict(
        tensor_parallel_size=1,
        max_model_len=32768,
        dtype="half",
        trust_remote_code=True,
    ),
)

app = build_openai_app({"llm_configs": [llm_config]})
serve.run(app, blocking=True)
from openai import OpenAI

client = OpenAI(
    # Replace the URL
    base_url="http://localhost:8000/v1",
    api_key="NOT A REAL KEY",
)


chat_response = client.chat.completions.create(
    model="OpenGVLab/InternVL2_5-1B-MPO",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"
                    },
                },
                {"type": "text", "text": "What is the text in the illustrate?"},
            ],
        },
    ],
    max_tokens=10,
)
if hasattr(chat_response, "choices"):
    print(chat_response.choices[0].message.content)

For example for this particular model here is the diff in the conversation that tokenizer.apply_chat_template() gets applied to:

on vllm:

conversation=[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': '<image>\nWhat is the text in the illustrate?'}]

Image is replaced with tag. This is done in the openAI server logic and then passed into the tokenizer's chat_template

on serve llm:

[Message(role='system', content='You are a helpful assistant.'), Message(role='user', content=[Image(field='image_url', image_url={'url': 'https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png'}), Text(field='text', type='text', text='What is the text in the illustrate?')])]

Issue Severity

High: It blocks me from completing my task.

@kouroshHakha kouroshHakha added bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order labels Apr 25, 2025
@kouroshHakha kouroshHakha self-assigned this Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order
Projects
None yet
Development

No branches or pull requests

1 participant