Commit 4ca9c1a
committed
mxtensor: add pre-swizzle support
Summary:
Adds the ability to pre-swizzle scales for `MXTensor`,
and turns it on for the inference workflow.
For activations, this is no-change for now but if we write a fused
kernel we'll hook into the pre-swizzled path.
For weights, this is a performance win in this PR as now we swizzle ahead of
time.
Rough magnitude of the weight pre-swizzling win:
on M, K, N == 4096, 4096, 4096, the inference fwd speedup on mxfp8
increases from 1.24x to 1.30x
Test Plan:
```bash
// correctness
CUDA_VISIBLE_DEVICES=5 pytest test/prototype/mx_formats/ -s
// performance
CUDA_VISIBLE_DEVICES=5 python benchmarks/float8/float8_inference_roofline.py ~/local/tmp/20251017_test.csv --recipe_name mxfp8_cublas --shape_gen_name pow2_extended
// before: https://www.internalfb.com/phabricator/paste/view/P1996942931
// after: https://www.internalfb.com/phabricator/paste/view/P1996941798
```
Reviewers:
Subscribers:
Tasks:
Tags:
ghstack-source-id: 46b8d23
ghstack-comment-id: 3415966576
Pull-Request: #32001 parent e7e1fdd commit 4ca9c1a
File tree
5 files changed
+361
-240
lines changed- test/prototype/mx_formats
- torchao/prototype/mx_formats
5 files changed
+361
-240
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
| 9 | + | |
8 | 10 | | |
9 | 11 | | |
10 | 12 | | |
| |||
22 | 24 | | |
23 | 25 | | |
24 | 26 | | |
| 27 | + | |
25 | 28 | | |
26 | 29 | | |
27 | 30 | | |
| |||
388 | 391 | | |
389 | 392 | | |
390 | 393 | | |
| 394 | + | |
391 | 395 | | |
392 | 396 | | |
393 | 397 | | |
| |||
645 | 649 | | |
646 | 650 | | |
647 | 651 | | |
648 | | - | |
649 | | - | |
650 | 652 | | |
651 | 653 | | |
652 | 654 | | |
| |||
716 | 718 | | |
717 | 719 | | |
718 | 720 | | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
| 114 | + | |
114 | 115 | | |
115 | 116 | | |
116 | 117 | | |
| |||
121 | 122 | | |
122 | 123 | | |
123 | 124 | | |
| 125 | + | |
124 | 126 | | |
125 | 127 | | |
126 | 128 | | |
| |||
0 commit comments