Skip to content

[Feature]: Support EPLB for More MoE Models, e.g. Qwen 3, Llama 4 #20468

@abmfy

Description

@abmfy

🚀 The feature, motivation and pitch

🎉 #18343 introduces dynamic Expert Parallelism Load Balancing (EPLB) for DeepSeek-V2/V3/R1 models.

As MoE (Mixture-of-Experts) models become more common, we’d love help extending EPLB support to other MoE models—such as Qwen3, Llama 4, and more.

This is a great first good issue for anyone interested in model internals or systems work. #18343 was built with generality in mind, so extending it to other models or quantization methods should be relatively straightforward.


✅ How to add support for a new model

Implement the MixtureOfExperts protocol. Specifically, you’ll need to:

  • Expose relevant MoE configuration flags.
  • Provide access to expert weights for EPLB to rearrange.
  • Forward EPLB-related arguments into the FusedMoE layer.

📌 Note on weight loading:
For models with redundant experts, you’ll need to carefully adjust the weight loading logic. FusedMoE returns an expert_params_mapping that reflects expert duplication, but you may need to modify the model class to ensure correct loading behavior.

🔎 Example: See how it’s done in deepseek_v2.py.

❗️Accuracy tests:
Since modifying the weight loader can be tricky, we suggest including an accuracy test (e.g., on GSM8k) in the PR to ensure the weight loading process remains intact.


✅ How to add support for quantized models

This is usually even easier—just make sure EPLB-related arguments are properly forwarded in your quantization path.

🔎 Example: See fp8.py for a minimal working change.


👋 Want to contribute?

We’d love your help in extending EPLB support! Feel free to comment below or open a draft PR—we’re happy to guide you through the process.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions