Warn that SAC + Compile for MoE models is not yet supported #2052

xmfan · 2025-11-18T01:28:59Z

Stacked PRs:

->Warn that SAC + Compile for MoE models is not yet supported #2052

Warn that SAC + Compile for MoE models is not yet supported. Behavior should be identical for moe blocks, dense blocks are no longer compiled.

This also fixes another issue: CheckpointWrapper is being applied to all submodules in SAC, but only at the block-level for Full AC. That breaks the logic of apply_compile ever since #1895.

stack-info: PR: #2052, branch: xmfan/stack/4

wwwjn · 2025-11-18T01:30:59Z

torchtitan/models/llama4/infra/parallelize.py

+
+    if ac_config.mode == "selective":
+        logger.warning(
+            "Selective Activation Checkpointing is not yet supported for MoE models, "


This is a little bit confusing, SAC works with eager for MoE models

stack-info: PR: #2052, branch: xmfan/stack/4

wwwjn

LGTM! Thanks for making this!

tianyu-l

sorry, didn't follow -- what's the issue between compile + SAC + MoE?

CheckpointWrapper is being applied to all submodules in SAC, but only at the block-level for Full AC. That breaks the logic of apply_compile ever since #1895.

What's the problem with full AC at block level? is it because we have full AC (compile)?

Also could you help make a central list on the composability issues among AC, compile, MoE?
I realized that

pytorch/pytorch#167844 fixes SAC around torch.compile region

tianyu-l · 2025-11-18T05:38:09Z

torchtitan/models/llama4/infra/parallelize.py

+            "Compile + Selective Activation Checkpointing is not yet supported for MoE models, "
+            "please use Full Activation Checkpointing instead. Turning off Compile."
+        )
+        return


can we just error out?

wwwjn · 2025-11-18T05:57:33Z

what's the issue between compile + SAC + MoE?

SAC will wrap each submodule of TransformerBlock separately (_apply_op_sac_to_transformer_block_with_flex), which will make each submodule of TransformerBlock an instance of CheckpointWrapper.

This will make the isinstance() check fail and fall back to else branch, causing a compile error.

So #1895 only works with Full AC, not SAC. AC(compile(moe)) works, but SAC(compile(moe)) doesn't work.

tianyu-l · 2025-11-18T06:05:16Z

@wwwjn
According to @ezyang

pytorch/pytorch#167844 fixes SAC around torch.compile region

So everything should be fixed now, we just need to remove the hack in _apply_op_sac_to_transformer_block_with_flex and test

soulitzer · 2025-11-18T16:36:26Z

fixes SAC around torch.compile region

So there are two cases here, depending on whether you care that compiling makes your graph opaque. The fix there primarily addresses one of the cases.
If you're only compiling a single op like FlexAttention, it is fine to not be able to see into the graph.
But for larger graphs, SAC(compile(fn will work, but it might not do exactly what you want. You'll only be able to save/recompute at the granularity of that whole graph.

wwwjn · 2025-11-18T18:33:32Z

To check my understanding:

If you're only compiling a single op like FlexAttention, it is fine to not be able to see into the graph.

So if only FlexAttn is compiled (not each transformer layers / or submodule of transformer layers), SAC works.

But for larger graphs, SAC(compile(fn will work, but it might not do exactly what you want. You'll only be able to save/recompute at the granularity of that whole graph.

Say if we compile each transformer layers, do you mean we can only save / recompute all the ops within the transformer layer, can not specify which ops to save in SAC region?

tianyu-l · 2025-11-18T18:59:53Z

@soulitzer

But for larger graphs, SAC(compile(fn will work, but it might not do exactly what you want. You'll only be able to save/recompute at the granularity of that whole graph.

Is this full AC behavior? Or do you mean something else? Seems I was aware of this behavior before.

soulitzer · 2025-11-18T20:52:30Z

@wwwjn @tianyu-l yeah I think your understanding is correct - either save all activations need for backward computed within the compiled region or recompute all ops, just like full AC.

So if only FlexAttn is compiled (not each transformer layers / or submodule of transformer layers), SAC works.

Yes, but existing policy needs to be updated to handle the inductor HOP.

xmfan requested review from fegin, tianyu-l, wconstab and wwwjn as code owners November 18, 2025 01:29

xmfan added a commit that referenced this pull request Nov 18, 2025

Warn that SAC + Compile for MoE models is not yet supported

922085c

stack-info: PR: #2052, branch: xmfan/stack/4

xmfan force-pushed the xmfan/stack/4 branch from 65ce8fe to 922085c Compare November 18, 2025 01:29

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 18, 2025

wwwjn reviewed Nov 18, 2025

View reviewed changes

Warn that SAC + Compile for MoE models is not yet supported

6ec0d31

stack-info: PR: #2052, branch: xmfan/stack/4

xmfan force-pushed the xmfan/stack/4 branch from 922085c to 6ec0d31 Compare November 18, 2025 01:32

wwwjn approved these changes Nov 18, 2025

View reviewed changes

tianyu-l requested changes Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Warn that SAC + Compile for MoE models is not yet supported #2052

Warn that SAC + Compile for MoE models is not yet supported #2052

Uh oh!

xmfan commented Nov 18, 2025 •

edited

Loading

Uh oh!

wwwjn Nov 18, 2025

Uh oh!

wwwjn left a comment

Uh oh!

tianyu-l left a comment •

edited

Loading

Uh oh!

tianyu-l Nov 18, 2025

Uh oh!

wwwjn commented Nov 18, 2025

Uh oh!

tianyu-l commented Nov 18, 2025

Uh oh!

soulitzer commented Nov 18, 2025

Uh oh!

wwwjn commented Nov 18, 2025 •

edited

Loading

Uh oh!

tianyu-l commented Nov 18, 2025

Uh oh!

soulitzer commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Warn that SAC + Compile for MoE models is not yet supported #2052

Are you sure you want to change the base?

Warn that SAC + Compile for MoE models is not yet supported #2052

Uh oh!

Conversation

xmfan commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wwwjn Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn left a comment

Choose a reason for hiding this comment

Uh oh!

tianyu-l left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tianyu-l Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn commented Nov 18, 2025

Uh oh!

tianyu-l commented Nov 18, 2025

Uh oh!

soulitzer commented Nov 18, 2025

Uh oh!

wwwjn commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianyu-l commented Nov 18, 2025

Uh oh!

soulitzer commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xmfan commented Nov 18, 2025 •

edited

Loading

tianyu-l left a comment •

edited

Loading

wwwjn commented Nov 18, 2025 •

edited

Loading