Commit ecf2ac9
General adoption for Mtile = 64 (#5075)
Summary:
Pull Request resolved: #5075
X-link: https://github.com/facebookresearch/FBGEMM/pull/2080
This diff generalizes the work in (D85155388) based on Gefei's diff D85631781 .
Compared to D85631781, we avoid registers warp shuffling by using 32b TMEM atoms.
This diff supports:
1. Different dtypes (fp8, bf16)
2. Different mtiles (128, 64)
Reviewed By: v0i0
Differential Revision: D85893883
fbshipit-source-id: 25e93e627c573a120ab46336d3f234064c5ae0661 parent 391f78d commit ecf2ac9
File tree
2 files changed
+193
-104
lines changed- fbgemm_gpu/experimental/gen_ai
- src/attention/cuda/cutlass_blackwell_fmha/collective
- test/attention
2 files changed
+193
-104
lines changed
0 commit comments