[moe] brings batch/sequence-wise load balance loss #2061

rakkit · 2025-11-19T19:16:03Z

This is a draft PR for:

Make the moe's load_balance_coeff configurable
add the batch and seq-wise aux loss for load balance. [ref: dpskv3 eqn. 17~20]

For now, it only applies to the DeepSeek model, but I can add it for all other moe models at the end.
(also, we dont log the aux loss, but i can add it in optimizer hook to do this if you want)

The main concern is that the aux loss does not work well with PP. From what I have tested, it works well only with 1F1B. And it is broken for ZBV or interleaved 1f1b.

To test it:
CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml" NGPU=4 ./run_train.sh --model.extra_losses.load_balance_loss_weight=0.001

…d seq-wise aux loss for load balance

rakkit · 2025-11-19T19:17:04Z

torchtitan/train.py

            job_config, parallel_dims=parallel_dims, ft_manager=self.ft_manager
        )

+        self.loss_fn = functools.partial(


we can add a condition here to wrap loss or not for MoE. for now all models in torchtitan only return a single output so its ok for now

1) make the moe's load_balance_coeff configurable 2) add the batch an…

1c5ddd5

…d seq-wise aux loss for load balance

rakkit requested review from fegin, tianyu-l, wconstab and wwwjn as code owners November 19, 2025 19:16

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 19, 2025

rakkit commented Nov 19, 2025

View reviewed changes

rakkit mentioned this pull request Nov 19, 2025

question of PP x aux_loss for MoE #1979

Open

tianyu-l requested a review from shuhuayu November 19, 2025 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[moe] brings batch/sequence-wise load balance loss #2061

[moe] brings batch/sequence-wise load balance loss #2061

rakkit commented Nov 19, 2025

Uh oh!

rakkit Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[moe] brings batch/sequence-wise load balance loss #2061

Are you sure you want to change the base?

[moe] brings batch/sequence-wise load balance loss #2061

Conversation

rakkit commented Nov 19, 2025

Uh oh!

rakkit Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant