Skip to content

Conversation

@shintaro-iwasaki
Copy link
Contributor

Summary:
This patch does refactoring for FBGEMM to slightly reduce compilation size and time associated with cub.

  1. Moved an inline function asynchronous_complete_cumsum() from embedding_backward_template_helpers.cuh to split_embeddings_utils.cu, which is the only code that uses this function.
  2. Instead of calling a template function cub::DeviceRadixSort::SortPairs, call a non-static function in FBGEMM to avoid expanding a template function in every gen_embedding_backward_* code.

Reviewed By: jspark1105

Differential Revision: D33801456

Summary:
This patch does refactoring for FBGEMM to slightly reduce compilation size and time associated with `cub`.
1. Moved an inline function `asynchronous_complete_cumsum()` from `embedding_backward_template_helpers.cuh` to `split_embeddings_utils.cu`, which is the only code that uses this function.
2. Instead of calling a template function `cub::DeviceRadixSort::SortPairs`, call a non-static function in FBGEMM to avoid expanding a template function in every `gen_embedding_backward_*` code.

Reviewed By: jspark1105

Differential Revision: D33801456

fbshipit-source-id: 6188def8808c9f73a630f1ccdd71c45674f2d255
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D33801456

q10 added a commit to q10/FBGEMM that referenced this pull request Apr 10, 2025
Summary:
Pull Request resolved: facebookresearch/FBGEMM#886

- Limit the number of ROCm hardware targets to reduce Nova ROCm build times

X-link: pytorch#3797

Reviewed By: sryap

Differential Revision: D70949678

Pulled By: q10

fbshipit-source-id: a14cc0f12c7988aa3e9b68549bad9d109f1d7ca6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants