[Feature]: Support EPLB for More MoE Models, e.g. Qwen 3, Llama 4

### 🚀 The feature, motivation and pitch

🎉 **#18343 introduces dynamic Expert Parallelism Load Balancing (EPLB)** for DeepSeek-V2/V3/R1 models.

As MoE (Mixture-of-Experts) models become more common, we’d love help extending EPLB support to other MoE models—such as Qwen3, Llama 4, and more.

This is a great **first good issue** for anyone interested in model internals or systems work. #18343 was built with generality in mind, so extending it to other models or quantization methods should be relatively straightforward.

---

### ✅ How to add support for a new model

Implement the `MixtureOfExperts` protocol. Specifically, you’ll need to:

- Expose relevant MoE configuration flags.
- Provide access to expert weights for EPLB to rearrange.
- Forward EPLB-related arguments into the `FusedMoE` layer.

📌 **Note on weight loading:**  
For models with **redundant experts**, you’ll need to carefully adjust the weight loading logic. `FusedMoE` returns an `expert_params_mapping` that reflects expert duplication, but you may need to modify the model class to ensure correct loading behavior.

🔎 Example: See how it’s done in [`deepseek_v2.py`](https://github.com/vllm-project/vllm/pull/18343/files#diff-420f1cd67991a63cb419ca0e00e6f42cbe825864d0541e0662eeed2f9ddbd021).

❗️**Accuracy tests:**
Since modifying the weight loader can be tricky, we suggest including an accuracy test (e.g., on GSM8k) in the PR to ensure the weight loading process remains intact.

---

### ✅ How to add support for quantized models

This is usually even easier—just make sure EPLB-related arguments are properly forwarded in your quantization path.

🔎 Example: See [`fp8.py`](https://github.com/vllm-project/vllm/pull/18343/files#diff-5511bfcc9c53f7d96517ad43e4087f6777bef21302da983f42cafae40a866644) for a minimal working change.

---

👋 **Want to contribute?**

We’d love your help in extending EPLB support! Feel free to comment below or open a draft PR—we’re happy to guide you through the process.

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Support EPLB for More MoE Models, e.g. Qwen 3, Llama 4 #20468

🚀 The feature, motivation and pitch

✅ How to add support for a new model

✅ How to add support for quantized models

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Support EPLB for More MoE Models, e.g. Qwen 3, Llama 4 #20468

Description

🚀 The feature, motivation and pitch

✅ How to add support for a new model

✅ How to add support for quantized models

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions