Skip to content

Commit a3bfd80

Browse files
committed
chore: add API docs
1 parent 965c913 commit a3bfd80

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

torchao/sparsity/README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,23 @@ For more information about accelerting BERT with semi-sturcutred sparsity, pleas
5858
| F1 (%) | 86.93 | 86.49 | -0.44 |
5959
| Time (bs=16) | 19.35 | 15.74 | 1.23x |
6060

61+
# Implemented APIs
62+
63+
## Quantization + Sparsity
64+
65+
### Sparse Marlin 2:4
66+
67+
Sparse-Marlin 2:4 is an optimized GPU kernel that extends the Mixed Auto-Regressive Linear (Marlin) dense kernel to support 4-bit quantized weights and 2:4 sparsity, improving performance in matrix multiplication and accumulation. Full documentation can be found [here](https://github.com/IST-DASLab/Sparse-Marlin).
68+
69+
```py
70+
from torchao.quantization.quant_api import quantize_, int4_weight_only
71+
from torchao.dtypes import MarlinSparseLayoutType
72+
73+
# Your FP16 model
74+
model = model.cuda().half()
75+
76+
quantize_(model, int4_weight_only(layout_type=MarlinSparseLayoutType()))
77+
```
6178

6279
# Design
6380

0 commit comments

Comments
 (0)