[mxfp8 moe training] update benchmarks and tests; simplify per group blocked swizzle ref function #3286

danielvegamyhre · 2025-11-04T00:42:29Z

Stacked PRs:

->[mxfp8 moe training] update benchmarks and tests; simplify per group blocked swizzle ref function #3286
[mxfp8 moe training] compute prefix sum of group sizes inside kernel intead of precomputing #3285

[mxfp8 moe training] update benchmarks and tests; simplify per group blocked swizzle ref function

Changes

Update benchmark_2d_3d_grouped_gemm.py to use torch._scaled_grouped_mm instead of the fbgemm custom op, now that it is integrated in core
Simplify torch_to_blocked_2d_M_groups to not require K param as it is not needed, update tests and benchmarks accordingly
Update bench_triton_mx_block_rearrange_2d_M_groups.py to also bench larger, more realistic total_M dim

…blocked swizzle ref function stack-info: PR: #3286, branch: danielvegamyhre/stack/83

pytorch-bot · 2025-11-04T00:42:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3286

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm failures during provisioning step due to network issues

✅ No Failures

As of commit 494df3f with merge base 01374eb ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…blocked swizzle ref function stack-info: PR: #3286, branch: danielvegamyhre/stack/83

[mxfp8 moe training] update benchmarks and tests; simplify per group …

494df3f

…blocked swizzle ref function stack-info: PR: #3286, branch: danielvegamyhre/stack/83

danielvegamyhre added a commit that referenced this pull request Nov 4, 2025

[mxfp8 moe training] update benchmarks and tests; simplify per group …

497998a

…blocked swizzle ref function stack-info: PR: #3286, branch: danielvegamyhre/stack/83

danielvegamyhre force-pushed the danielvegamyhre/stack/83 branch from df83d6e to 497998a Compare November 4, 2025 00:42

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 4, 2025

danielvegamyhre mentioned this pull request Nov 4, 2025

[mxfp8 moe training] compute prefix sum of group sizes inside kernel intead of precomputing #3285

Merged

danielvegamyhre added mx moe topic: not user facing Use this tag if you don't want this PR to show up in release notes labels Nov 4, 2025

danielvegamyhre changed the base branch from danielvegamyhre/stack/82 to main November 4, 2025 00:48

danielvegamyhre added a commit that referenced this pull request Nov 4, 2025

[mxfp8 moe training] update benchmarks and tests; simplify per group …

64f9440

…blocked swizzle ref function stack-info: PR: #3286, branch: danielvegamyhre/stack/83

danielvegamyhre force-pushed the danielvegamyhre/stack/83 branch from 497998a to 64f9440 Compare November 4, 2025 00:48

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/82 November 4, 2025 00:48

danielvegamyhre requested review from drisspg and vkuzo November 4, 2025 00:49

danielvegamyhre changed the base branch from danielvegamyhre/stack/82 to main November 4, 2025 16:34

danielvegamyhre force-pushed the danielvegamyhre/stack/83 branch from 64f9440 to 494df3f Compare November 4, 2025 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mxfp8 moe training] update benchmarks and tests; simplify per group blocked swizzle ref function #3286

[mxfp8 moe training] update benchmarks and tests; simplify per group blocked swizzle ref function #3286

danielvegamyhre commented Nov 4, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[mxfp8 moe training] update benchmarks and tests; simplify per group blocked swizzle ref function #3286

Are you sure you want to change the base?

[mxfp8 moe training] update benchmarks and tests; simplify per group blocked swizzle ref function #3286

Conversation

danielvegamyhre commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

pytorch-bot bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3286

❗ 1 Active SEVs

✅ No Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielvegamyhre commented Nov 4, 2025 •

edited

Loading

pytorch-bot bot commented Nov 4, 2025 •

edited

Loading