[Installation]: Qwen-VL-Chat kept returning out of vocabulary (OOV) error with docker deployment

### How you are installing vllm

Hi and I followed the sample code from PR #8029 to deploy Qwen-VL-Chat with vllm docker. While deployment was successful, I kept getting out of vocabulary OOV errors no matter how I test my inputs. My environment is Ubuntu 20.04 LTS, 2080Ti 22G x2, Docker deployment was successful for Qwen2-VL-7B and Qwen2.5:32b so it should not be configuration issue.

How I deployed:
```
sudo docker run --runtime nvidia --gpus '"device=0,1"' --ipc=host -p 18434:8000   -v hf_cache:/root/.cache/huggingface   -d   -e HF_ENDPOINT=https://hf-mirror.com   -e HF_HUB_ENABLE_HF_TRANSFER=0   --name Qwen-VL-Chat   vllm/vllm-openai:latest   --model Qwen/Qwen-VL-Chat   --tokenizer Qwen/Qwen-VL-Chat   --tensor-parallel-size 2   --trust-remote-code   --chat-template examples/template_chatml.jinja   --dtype='half'
```

Error msg:
```
Error in API call: 400 {"object":"error","message":"Token id 151859 is out of vocabulary","type":"BadRequestError","param":null,"code":400}
```
Test code:
```
import requests
import base64
import time

# Function to encode the image to base64
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

def main():
    # Path to your image
    image_path = "test2.jpg"
    base64_image = encode_image(image_path)

    # API configuration
    api_base = "http://192.168.50.18:18434/v1/chat/completions"
    model_name = "Qwen/Qwen-VL-Chat"

    # Input prompt
    user_prompt_text = (
        "What's inside the image?"
    )

    # Prepare the payload
    payload_template = {
        "model": model_name,
        "messages": [
            {
                "role": "user",
                "content": [
                    # {"type": "image_url", "image_url": {"url": "https://i.imgur.com/T3S0cvu.jpeg"}},
                    {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}},
                    {"type": "text", "text": user_prompt_text}
                ]
            }
        ],
        "max_tokens": 300
    }

    for i in range(1, 2):
        print(f"===== API called {i} times =====")
        startTime = time.time()

        response = requests.post(api_base, json=payload_template)

        if response.status_code != 200:
            print("Error in API call:", response.status_code, response.text)
        else:
            completion = response.json()["choices"][0]["message"]["content"]
            tokens = response.json()["usage"]["prompt_tokens"]
            print("Model Response:", completion)
            print("tokens:", tokens)

        print("time used: {:.2f} 秒".format(time.time() - startTime))
        print()

if __name__ == "__main__":
    main()
```
I tried to search the whole observable web and could not find any similar case. So I'm replying here for possible help.

Much appreciated!

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Installation]: Qwen-VL-Chat kept returning out of vocabulary (OOV) error with docker deployment #11832

How you are installing vllm

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Installation]: Qwen-VL-Chat kept returning out of vocabulary (OOV) error with docker deployment #11832

Description

How you are installing vllm

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions