Commit cfe8683
embedding forward optimization for MI350 (#5064)
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2095
optimization on embedding forward for MI350:
1. apply vec4 on embedding vbe forward kernel instead of vec2
2. As there are 64 threads in rocm, optimize subwarp in embedding forward v2 kernel when embedding dim is from 32 to 64.
Pull Request resolved: #5064
Reviewed By: q10
Differential Revision: D85701691
Pulled By: spcyppt
fbshipit-source-id: 72f491414f50e53038a4b02f3d555967d34740a71 parent c5be0ac commit cfe8683
File tree
3 files changed
+16
-26
lines changed- fbgemm_gpu/codegen/training/forward
3 files changed
+16
-26
lines changedLines changed: 2 additions & 19 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | 87 | | |
91 | | - | |
92 | 88 | | |
93 | 89 | | |
94 | 90 | | |
| |||
182 | 178 | | |
183 | 179 | | |
184 | 180 | | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | 181 | | |
189 | | - | |
190 | 182 | | |
191 | 183 | | |
192 | 184 | | |
| |||
319 | 311 | | |
320 | 312 | | |
321 | 313 | | |
322 | | - | |
| 314 | + | |
323 | 315 | | |
324 | 316 | | |
325 | 317 | | |
| |||
633 | 625 | | |
634 | 626 | | |
635 | 627 | | |
636 | | - | |
637 | | - | |
638 | | - | |
639 | | - | |
640 | 628 | | |
641 | | - | |
642 | 629 | | |
643 | 630 | | |
644 | 631 | | |
| |||
743 | 730 | | |
744 | 731 | | |
745 | 732 | | |
746 | | - | |
747 | | - | |
748 | | - | |
749 | 733 | | |
750 | 734 | | |
751 | | - | |
752 | 735 | | |
753 | 736 | | |
754 | 737 | | |
| |||
930 | 913 | | |
931 | 914 | | |
932 | 915 | | |
933 | | - | |
| 916 | + | |
934 | 917 | | |
935 | 918 | | |
936 | 919 | | |
| |||
Lines changed: 7 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
975 | 975 | | |
976 | 976 | | |
977 | 977 | | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
978 | 985 | | |
979 | 986 | | |
980 | 987 | | |
| |||
Lines changed: 7 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
720 | 720 | | |
721 | 721 | | |
722 | 722 | | |
723 | | - | |
724 | | - | |
725 | | - | |
726 | | - | |
727 | 723 | | |
728 | | - | |
729 | 724 | | |
730 | 725 | | |
731 | 726 | | |
| |||
799 | 794 | | |
800 | 795 | | |
801 | 796 | | |
| 797 | + | |
802 | 798 | | |
803 | | - | |
804 | | - | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
805 | 805 | | |
806 | 806 | | |
807 | 807 | | |
| |||
0 commit comments