group_index_select_or_add_2d_kernel forward pass optimization (#5078) #5080

q10 · 2025-11-04T05:44:03Z

Summary:
This PR introduces optimization for group_index_select_or_add_2d_kernel (USE_INDEX_SELECT==true) kernel with primary focus on float type and relatively small embedding dimensions. 2 things are implemented:

Extracted the common variables out of the loop to omit unnecessary synchronizations on memory load (compiler won't do that automatically)
Switch to 32 threads logical wave sizes to reduce granularity losses.

Differential Revision: D86135611

Pulled By: q10

netlify · 2025-11-04T05:44:08Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`23e13e3`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/6909930adcc3ce00088b70ee
😎 Deploy Preview	https://deploy-preview-5080--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

meta-codesync · 2025-11-04T05:44:13Z

@q10 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D86135611.

…h#5080) Summary: X-link: facebookresearch/FBGEMM#2087 This PR introduces optimization for `group_index_select_or_add_2d_kernel` (`USE_INDEX_SELECT==true`) kernel with primary focus on `float` type and relatively small embedding dimensions. 2 things are implemented: 1) Extracted the common variables out of the loop to omit unnecessary synchronizations on memory load (compiler won't do that automatically) 2) Switch to 32 threads logical wave sizes to reduce granularity losses. Differential Revision: D86135611 Pulled By: q10

meta-codesync · 2025-11-04T11:44:54Z

@q10 merged this pull request in be1b514.

…h#5080) Summary: Pull Request resolved: pytorch#5080 X-link: https://github.com/facebookresearch/FBGEMM/pull/2087 This PR introduces optimization for `group_index_select_or_add_2d_kernel` (`USE_INDEX_SELECT==true`) kernel with primary focus on `float` type and relatively small embedding dimensions. 2 things are implemented: 1) Extracted the common variables out of the loop to omit unnecessary synchronizations on memory load (compiler won't do that automatically) 2) Switch to 32 threads logical wave sizes to reduce granularity losses. Pull Request resolved: pytorch#5078 Reviewed By: spcyppt, haoyuz Differential Revision: D86135611 Pulled By: q10 fbshipit-source-id: f4fb9966f5f5180c4dde2aed92ca726c260b7743

meta-cla bot added the cla signed label Nov 4, 2025

meta-codesync bot added fb-exported meta-exported labels Nov 4, 2025

q10 force-pushed the export-D86135611 branch from 27c4a26 to b1f8dae Compare November 4, 2025 05:45

q10 force-pushed the export-D86135611 branch from b1f8dae to 23e13e3 Compare November 4, 2025 05:45

meta-codesync bot closed this in be1b514 Nov 4, 2025

facebook-github-bot added the Merged label Nov 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

group_index_select_or_add_2d_kernel forward pass optimization (#5078) #5080

group_index_select_or_add_2d_kernel forward pass optimization (#5078) #5080

Uh oh!

q10 commented Nov 4, 2025

Uh oh!

netlify bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

meta-codesync bot commented Nov 4, 2025

Uh oh!

meta-codesync bot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

group_index_select_or_add_2d_kernel forward pass optimization (#5078) #5080

group_index_select_or_add_2d_kernel forward pass optimization (#5078) #5080

Uh oh!

Conversation

q10 commented Nov 4, 2025

Uh oh!

netlify bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

meta-codesync bot commented Nov 4, 2025

Uh oh!

meta-codesync bot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify bot commented Nov 4, 2025 •

edited

Loading