Skip to content

Conversation

@yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Oct 27, 2025

Purpose

Fixes @smarterclayton 's issue

vllm serve deepseek-ai/DeepSeek-V2-lite --port=8000 --enable-expert-parallel --enable-eplb --num-redundant-experts=16 --eplb-window-size=100 --eplb-step-interval=100 --eplb-log-balancedness -dp 2

We will meet an issue

(EngineCore_DP1 pid=1223640)     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP1 pid=1223640)   File "/home/wentao/vllm-source/vllm/v1/worker/gpu_model_runner.py", line 2932, in load_model
(EngineCore_DP1 pid=1223640)     self.eplb_state = EplbState.build(
(EngineCore_DP1 pid=1223640)                       ^^^^^^^^^^^^^^^^
(EngineCore_DP1 pid=1223640)   File "/home/wentao/vllm-source/vllm/distributed/eplb/eplb_state.py", line 316, in build
(EngineCore_DP1 pid=1223640)     model.set_eplb_state(
(EngineCore_DP1 pid=1223640)   File "/home/wentao/vllm-source/vllm/model_executor/models/deepseek_v2.py", line 1252, in set_eplb_state
(EngineCore_DP1 pid=1223640)     self.expert_weights.append(layer.get_expert_weights())
(EngineCore_DP1 pid=1223640)                                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP1 pid=1223640)   File "/home/wentao/vllm-source/vllm/model_executor/layers/fused_moe/layer.py", line 1948, in get_expert_weights
(EngineCore_DP1 pid=1223640)     weight.view(self.local_num_experts, -1)
(EngineCore_DP1 pid=1223640)   File "/home/wentao/vllm-source/vllm/model_executor/parameter.py", line 126, in __torch_function__
(EngineCore_DP1 pid=1223640)     return super().__torch_function__(func, types, args, kwargs)
(EngineCore_DP1 pid=1223640)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP1 pid=1223640) RuntimeError: shape '[40, -1]' is invalid for input of size 131072
(EngineCore_DP0 pid=1223639) Traceback (most recent call last):
(EngineCore_DP0 pid=1223639)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=1223639)     self.run()
(EngineCore_DP0 pid=1223639)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=1223639)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=1223639)   File "/home/wentao/vllm-source/vllm/v1/engine/core.py", line 783, in run_engine_core
(EngineCore_DP0 pid=1223639)     raise e
(EngineCore_DP0 pid=1223639)   File "/home/wentao/vllm-source/vllm/v1/engine/core.py", line 766, in run_engine_core
(EngineCore_DP0 pid=1223639)     engine_core = DPEngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=1223639)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1223639)   File "/home/wentao/vllm-source/vllm/v1/engine/core.py", line 1061, in __init__
(EngineCore_DP0 pid=1223639)     super().__init__(
(EngineCore_DP0 pid=1223639)   File "/home/wentao/vllm-source/vllm/v1/engine/core.py", line 538, in __init__
(EngineCore_DP0 pid=1223639)     super().__init__(
(EngineCore_DP0 pid=1223639)   File "/home/wentao/vllm-source/vllm/v1/engine/core.py", line 102, in __init__
(EngineCore_DP0 pid=1223639)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=1223639)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1223639)   File "/home/wentao/vllm-source/vllm/v1/executor/abstract.py", line 98, in __init__
(EngineCore_DP0 pid=1223639)     self._init_executor()
(EngineCore_DP0 pid=1223639)   File "/home/wentao/vllm-source/vllm/v1/executor/uniproc_executor.py", line 47, in _init_executor
(EngineCore_DP0 pid=1223639)     self.driver_worker.load_model()
(EngineCore_DP0 pid=1223639)   File "/home/wentao/vllm-source/vllm/v1/worker/gpu_worker.py", line 233, in load_model
(EngineCore_DP0 pid=1223639)     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=1223639)   File "/home/wentao/vllm-source/vllm/v1/worker/gpu_model_runner.py", line 2932, in load_model
(EngineCore_DP0 pid=1223639)     self.eplb_state = EplbState.build(
(EngineCore_DP0 pid=1223639)                       ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1223639)   File "/home/wentao/vllm-source/vllm/distributed/eplb/eplb_state.py", line 316, in build
(EngineCore_DP0 pid=1223639)     model.set_eplb_state(
(EngineCore_DP0 pid=1223639)   File "/home/wentao/vllm-source/vllm/model_executor/models/deepseek_v2.py", line 1252, in set_eplb_state
(EngineCore_DP0 pid=1223639)     self.expert_weights.append(layer.get_expert_weights())
(EngineCore_DP0 pid=1223639)                                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1223639)   File "/home/wentao/vllm-source/vllm/model_executor/layers/fused_moe/layer.py", line 1948, in get_expert_weights
(EngineCore_DP0 pid=1223639)     weight.view(self.local_num_experts, -1)
(EngineCore_DP0 pid=1223639)   File "/home/wentao/vllm-source/vllm/model_executor/parameter.py", line 126, in __torch_function__
(EngineCore_DP0 pid=1223639)     return super().__torch_function__(func, types, args, kwargs)
(EngineCore_DP0 pid=1223639)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1223639) RuntimeError: shape '[40, -1]' is invalid for input of size 131072

This PR fixes it

Test

(APIServer pid=1225413) INFO 10-27 09:00:03 [launcher.py:46] Route: /start_profile, Methods: POST
(APIServer pid=1225413) INFO 10-27 09:00:03 [launcher.py:46] Route: /stop_profile, Methods: POST
(APIServer pid=1225413) INFO 10-27 09:00:03 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=1225413) INFO:     Started server process [1225413]
(APIServer pid=1225413) INFO:     Waiting for application startup.
(APIServer pid=1225413) INFO:     Application startup complete.

@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 27, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a minor change to vllm/model_executor/layers/fused_moe/layer.py to exclude parameters from non-expert submodules (e.g., gate/shared) when retrieving expert weights. This change addresses a shape issue encountered when using EPLB with DeepSeek-V2-lite. I have added a high severity review comment to ensure the change is correct.

@DarkLight1337 DarkLight1337 merged commit 0484b64 into main Oct 28, 2025
53 checks passed
@DarkLight1337 DarkLight1337 deleted the wentao-fix-shape-issue-for-eplb-expert branch October 28, 2025 12:44
bhagyashrigai pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Oct 29, 2025
Signed-off-by: yewentao256 <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Bhagyashri <[email protected]>
ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants