You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: The quantization error incurred by applying int4 quantization to your model can be fairly significant, so using external techniques like GPTQ may be necessary to obtain a usable model.
99
99
100
-
#### A16W8 WeightOnly Quantization
100
+
#### A16W8 Int8 WeightOnly Quantization
101
101
102
102
```python
103
103
# for torch 2.4+
@@ -109,7 +109,7 @@ from torchao.quantization.quant_api import change_linear_weights_to_int8_woqtens
109
109
change_linear_weights_to_int8_woqtensors(model)
110
110
```
111
111
112
-
#### A8W8 Dynamic Quantization
112
+
#### A8W8 Int8 Dynamic Quantization
113
113
114
114
```python
115
115
# for torch 2.4+
@@ -121,6 +121,22 @@ from torchao.quantization.quant_api import change_linear_weights_to_int8_dqtenso
121
121
change_linear_weights_to_int8_dqtensors(model)
122
122
```
123
123
124
+
#### A16W8 Float8 WeightOnly Quantization
125
+
126
+
```python
127
+
# for torch 2.5+
128
+
from torchao.quantization import quantize_, float8_weight_only
129
+
quantize_(model, float8_weight_only())
130
+
```
131
+
132
+
#### A16W8 Float8 Dynamic Quantization with Rowwise Scaling
133
+
134
+
```python
135
+
# for torch 2.5+
136
+
from torchao.quantization.quant_api import quantize_, PerRow, float8_dynamic_activation_float8_weight
0 commit comments