Skip to content

Conversation

@Jun-Howie
Copy link
Collaborator

@Jun-Howie Jun-Howie commented Sep 29, 2025

What's New

  1. Model Support

    • Added support for Qwen3-VL Instruct/Thinking models.
  2. Quantized Models

    • Provided additional FP8 and AWQ quantized models.
    • Fully tested and supported.
  3. Tool Calling

    • Added support for tool_call, consistent with the Qwen3 series.
  4. Frontend UI

    • Introduced the --enable-expert-parallel parameter.
    • Aligned with vllm/core.py, enabling expert parallelism so that each GPU is allocated the same number of experts.

Dependencies (vLLM Backend)

pip install git+https://github.com/huggingface/transformers
pip install qwen-vl-utils==0.0.14

# Recommended (if the stable version is not working, use nightly)
# pip install 'vllm>0.10.2' 
uv pip install -U vllm \
    --torch-backend=auto \
    --extra-index-url https://wheels.vllm.ai/nightly

Usage Examples

2 × A100-80G (AWQ)

vllm serve \
    ./Qwen3-VL-235B-A22B-Instruct-AWQ \
    --enable-expert-parallel \
    --max-model-len 32768 \
    --tensor-parallel-size 2

4 × A100-80G (FP8)

vllm serve \
   ./Qwen3-VL-235B-A22B-Instruct-FP8 \
   --enable-expert-parallel \
   --max-model-len 32768 \
   --tensor-parallel-size 4

Test Results

544a39cc26a19af63cafcd550f2a9455 79eb8973fc6200a325f6b3a755ec03e2

@XprobeBot XprobeBot added the gpu label Sep 29, 2025
@XprobeBot XprobeBot added this to the v1.x milestone Sep 29, 2025
@qinxuye qinxuye changed the title Support Qwen3-VL FEAT: [model] Support Qwen3-VL Sep 29, 2025
Copy link
Contributor

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye merged commit bc3b42c into xorbitsai:main Sep 30, 2025
4 of 13 checks passed
@Jun-Howie Jun-Howie deleted the Qwen3-VL branch October 9, 2025 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants