Skip to content

[Feature]: Qwen3 Omni Support #25809

@VivekMalipatel

Description

@VivekMalipatel

🚀 The feature, motivation and pitch

I am encountering this when deploying Qwen3 Omni on H200

[service]: INFO 09-27 12:03:27 [init.py:216] Automatically detected platform cuda.
[service]: �[1;36m(APIServer pid=20)�[0;0m INFO 09-27 12:03:35 [api_server.py:1839] vLLM API server version 0.11.0rc2.dev33+gc216119d6
[service]: �[1;36m(APIServer pid=20)�[0;0m INFO 09-27 12:03:35 [utils.py:233] non-default args: {'model_tag': 'Qwen/Qwen3-Omni-30B-A3B-Instruct', 'host': '0.0.0.0', 'port': 8001, 'api_key': ['test-key'], 'model': 'Qwen/Qwen3-Omni-30B-A3B-Instruct', 'trust_remote_code': True, 'enforce_eager': True, 'gpu_memory_utilization': 0.85, 'enable_prefix_caching': True}
[service]: �[1;36m(APIServer pid=20)�[0;0m The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
[service]: �[1;36m(APIServer pid=20)�[0;0m Traceback (most recent call last):
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/local/bin/vllm", line 10, in
[service]: �[1;36m(APIServer pid=20)�[0;0m sys.exit(main())
[service]: �[1;36m(APIServer pid=20)�[0;0m ^^^^^^
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 54, in main
[service]: �[1;36m(APIServer pid=20)�[0;0m args.dispatch_function(args)
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
[service]: �[1;36m(APIServer pid=20)�[0;0m uvloop.run(run_server(args))
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 109, in run
[service]: �[1;36m(APIServer pid=20)�[0;0m return __asyncio.run(
[service]: �[1;36m(APIServer pid=20)�[0;0m ^^^^^^^^^^^^^^
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
[service]: �[1;36m(APIServer pid=20)�[0;0m return runner.run(main)
[service]: �[1;36m(APIServer pid=20)�[0;0m ^^^^^^^^^^^^^^^^
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
[service]: �[1;36m(APIServer pid=20)�[0;0m return self._loop.run_until_complete(task)
[service]: �[1;36m(APIServer pid=20)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[service]: �[1;36m(APIServer pid=20)�[0;0m File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 61, in wrapper
[service]: �[1;36m(APIServer pid=20)�[0;0m return await main
[service]: �[1;36m(APIServer pid=20)�[0;0m ^^^^^^^^^^
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
[service]: �[1;36m(APIServer pid=20)�[0;0m await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
[service]: �[1;36m(APIServer pid=20)�[0;0m async with build_async_engine_client(
[service]: �[1;36m(APIServer pid=20)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
[service]: �[1;36m(APIServer pid=20)�[0;0m return await anext(self.gen)
[service]: �[1;36m(APIServer pid=20)�[0;0m ^^^^^^^^^^^^^^^^^^^^^
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
[service]: �[1;36m(APIServer pid=20)�[0;0m async with build_async_engine_client_from_engine_args(
[service]: �[1;36m(APIServer pid=20)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
[service]: �[1;36m(APIServer pid=20)�[0;0m return await anext(self.gen)
[service]: �[1;36m(APIServer pid=20)�[0;0m ^^^^^^^^^^^^^^^^^^^^^
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 206, in build_async_engine_client_from_engine_args
[service]: �[1;36m(APIServer pid=20)�[0;0m vllm_config = engine_args.create_engine_config(usage_context=usage_context)
[service]: �[1;36m(APIServer pid=20)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1142, in create_engine_config
[service]: �[1;36m(APIServer pid=20)�[0;0m model_config = self.create_model_config()
[service]: �[1;36m(APIServer pid=20)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 994, in create_model_config
[service]: �[1;36m(APIServer pid=20)�[0;0m return ModelConfig(
[service]: �[1;36m(APIServer pid=20)�[0;0m ^^^^^^^^^^^^
[service]: �[1;36m(APIServer pid=20)�[0;0m File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 123, in init
[service]: �[1;36m(APIServer pid=20)�[0;0m s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
[service]: �[1;36m(APIServer pid=20)�[0;0m pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
[service]: �[1;36m(APIServer pid=20)�[0;0m Value error, The checkpoint you are trying to load has model type qwen3_omni_moe but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
[service]: �[1;36m(APIServer pid=20)�[0;0m
[service]: �[1;36m(APIServer pid=20)�[0;0m You can update Transformers with the command pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]
[service]: �[1;36m(APIServer pid=20)�[0;0m For further information visit https://errors.pydantic.dev/2.11/v/value_error

docker stop vllm-qwen3-omni && docker rm vllm-qwen3-omni
export VLLM_COMMIT=c0ec81836fd47492a900a2538dea461619122555
docker run -d --name vllm-qwen3-omni
--restart unless-stopped
--network host
--gpus all
-v "$HF_CACHE":/root/.cache/huggingface
-e NCCL_P2P_DISABLE=1
public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:${VLLM_COMMIT}
bash -lc '
pip install --no-cache-dir flash-attn
vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct
--host 0.0.0.0 --port 8001
--served-model-name o3
--trust-remote-code
--enable-auto-tool-choice
--max-model-len 65536
--gpu-memory-utilization 0.85
--api-key test-key'
docker logs -f vllm-qwen3

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions