Commit ea8c00f
authored
Improve QAT int4 weight-only numerics (#2986)
**Summary:** Similar to #2937, this commit improves the prepare
vs convert SQNR of int4 weight-only QAT from 12 to 45. This is
achieved by mimicking the numerics of the target FBGEMM bf16-int4
kernel more closely. In particular, the FBGEMM kernel:
1. Performs asymmetric [0, 15] quant first then recenters to 8
2. Uses smaller scale eps of 1e-6 instead of bf16's eps (0.0078125)
3. Quantizes the weights using min val instead of zero points
**Unit tests:**
```
python test/quantization/test_qat.py -k test_quantize_api_int4
python test/quantization/test_qat.py -k test_fbgemm_int4_weight_only_primitives
```
**End-to-end tests:**
Fine-tuning Llama3.1-8B with and without this PR in unsloth:
- fine-tune for 1 epoch on yahma/alpaca-cleaned with LoRA
- batch size 8, learning rate 2e-4, no gradient accumulation
Wikitext:
- QAT int4 quantized model (with this PR) achieved 33% lower
perplexity than the int4 baseline
- QAT int4 quantized model without this PR was worse
```
==> unsloth_model_lora_baseline_output/lm_eval_float.log <==
| | |none | 0|word_perplexity|↓ |7.5551|± | N/A|
==> unsloth_model_lora_baseline_output/lm_eval_quantized.log <==
| | |none | 0|word_perplexity|↓ |8.7655|± | N/A|
# QAT without this PR (quantized)
==> unsloth_model_lora_qat_int4_output/lm_eval_quantized.log <==
| | |none | 0|word_perplexity|↓ |8.3548|± | N/A|
# QAT with this PR (quantized)
==> unsloth_model_lora_qat_int4_output/lm_eval_quantized.log <==
| | |none | 0|word_perplexity|↓ |10.0683|± | N/A|
```1 parent 56ae935 commit ea8c00f
File tree
3 files changed
+131
-17
lines changed- test/quantization
- torchao/quantization/qat
3 files changed
+131
-17
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| 86 | + | |
86 | 87 | | |
87 | 88 | | |
88 | 89 | | |
| |||
1942 | 1943 | | |
1943 | 1944 | | |
1944 | 1945 | | |
1945 | | - | |
| 1946 | + | |
| 1947 | + | |
| 1948 | + | |
| 1949 | + | |
1946 | 1950 | | |
1947 | 1951 | | |
1948 | 1952 | | |
1949 | 1953 | | |
1950 | 1954 | | |
1951 | 1955 | | |
1952 | | - | |
1953 | | - | |
| 1956 | + | |
| 1957 | + | |
1954 | 1958 | | |
1955 | 1959 | | |
1956 | 1960 | | |
| |||
2004 | 2008 | | |
2005 | 2009 | | |
2006 | 2010 | | |
2007 | | - | |
| 2011 | + | |
2008 | 2012 | | |
2009 | | - | |
| 2013 | + | |
2010 | 2014 | | |
2011 | 2015 | | |
2012 | 2016 | | |
| |||
2094 | 2098 | | |
2095 | 2099 | | |
2096 | 2100 | | |
2097 | | - | |
| 2101 | + | |
2098 | 2102 | | |
2099 | 2103 | | |
2100 | 2104 | | |
| |||
2171 | 2175 | | |
2172 | 2176 | | |
2173 | 2177 | | |
| 2178 | + | |
| 2179 | + | |
| 2180 | + | |
| 2181 | + | |
| 2182 | + | |
| 2183 | + | |
| 2184 | + | |
| 2185 | + | |
| 2186 | + | |
| 2187 | + | |
| 2188 | + | |
| 2189 | + | |
| 2190 | + | |
| 2191 | + | |
| 2192 | + | |
| 2193 | + | |
| 2194 | + | |
| 2195 | + | |
| 2196 | + | |
| 2197 | + | |
| 2198 | + | |
| 2199 | + | |
| 2200 | + | |
| 2201 | + | |
| 2202 | + | |
| 2203 | + | |
| 2204 | + | |
| 2205 | + | |
| 2206 | + | |
| 2207 | + | |
| 2208 | + | |
| 2209 | + | |
| 2210 | + | |
| 2211 | + | |
| 2212 | + | |
| 2213 | + | |
| 2214 | + | |
| 2215 | + | |
| 2216 | + | |
| 2217 | + | |
| 2218 | + | |
| 2219 | + | |
| 2220 | + | |
| 2221 | + | |
| 2222 | + | |
| 2223 | + | |
| 2224 | + | |
| 2225 | + | |
| 2226 | + | |
| 2227 | + | |
| 2228 | + | |
| 2229 | + | |
| 2230 | + | |
| 2231 | + | |
| 2232 | + | |
| 2233 | + | |
| 2234 | + | |
| 2235 | + | |
| 2236 | + | |
| 2237 | + | |
| 2238 | + | |
| 2239 | + | |
| 2240 | + | |
| 2241 | + | |
| 2242 | + | |
2174 | 2243 | | |
2175 | 2244 | | |
2176 | 2245 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
| |||
77 | 78 | | |
78 | 79 | | |
79 | 80 | | |
| 81 | + | |
80 | 82 | | |
81 | 83 | | |
82 | 84 | | |
83 | 85 | | |
84 | 86 | | |
| 87 | + | |
| 88 | + | |
85 | 89 | | |
86 | 90 | | |
87 | 91 | | |
| |||
92 | 96 | | |
93 | 97 | | |
94 | 98 | | |
95 | | - | |
96 | | - | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
97 | 103 | | |
98 | 104 | | |
99 | 105 | | |
| |||
379 | 385 | | |
380 | 386 | | |
381 | 387 | | |
382 | | - | |
383 | | - | |
384 | | - | |
385 | | - | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
386 | 399 | | |
387 | 400 | | |
388 | 401 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
103 | 103 | | |
104 | 104 | | |
105 | 105 | | |
| 106 | + | |
106 | 107 | | |
107 | 108 | | |
108 | 109 | | |
109 | | - | |
| 110 | + | |
110 | 111 | | |
| 112 | + | |
| 113 | + | |
111 | 114 | | |
112 | 115 | | |
113 | 116 | | |
| |||
118 | 121 | | |
119 | 122 | | |
120 | 123 | | |
121 | | - | |
122 | | - | |
123 | | - | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
124 | 130 | | |
125 | | - | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
126 | 136 | | |
127 | 137 | | |
128 | 138 | | |
| |||
159 | 169 | | |
160 | 170 | | |
161 | 171 | | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
162 | 194 | | |
163 | 195 | | |
164 | 196 | | |
| |||
0 commit comments