You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+17-14Lines changed: 17 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,13 +10,14 @@
10
10
</p>
11
11
12
12
## News
13
+
* 07/02/2024 🚀 [v0.9.3](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.3): Added Gemma 2 support, faster PPL calculations on gpu, and more code/arg refractor.
13
14
14
-
* 06/30/2024 🚀 [v0.9.2](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.2) released. Added auto-padding of model in/out-features for exllama, exllama v2, marlin.
15
+
* 06/30/2024 🚀 [v0.9.2](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.2): Added auto-padding of model in/out-features for exllama, exllama v2, marlin.
15
16
Fixed quantization of OPT and DeepSeek V2-Lite models. Fixed inference for DeepSeek V2-Lite.
16
17
17
-
* 06/29/2024 🚀🚀🚀 [v0.9.1](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.1) released. With 3 new models (DeepSeek-V2, DeepSeek-V2-Lite, DBRX Converted), BITBLAS new format/kernel, proper batching of calibration dataset resulting > 50% quantization speedup, security hash check of loaded model weights, tons of refractor/usability improvements, bugs fixes and much more.
18
+
* 06/29/2024 🚀🚀🚀 [v0.9.1](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.1): With 3 new models (DeepSeek-V2, DeepSeek-V2-Lite, DBRX Converted), BITBLAS new format/kernel, proper batching of calibration dataset resulting > 50% quantization speedup, security hash check of loaded model weights, tons of refractor/usability improvements, bugs fixes and much more.
18
19
19
-
* 06/20/2924 ✨ GPTQModel [v0.9.0](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.0) released. Thanks for all the work from ModelCloud team and the opensource ML community for their contributions!
20
+
* 06/20/2924 ✨ GPTQModel [v0.9.0](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.0): Thanks for all the work from ModelCloud team and the opensource ML community for their contributions!
20
21
21
22
## Mission Statement
22
23
@@ -30,6 +31,7 @@ We will backport bug fixes to AutoGPTQ on a case-by-case basis.
30
31
31
32
## Major Changes (Advantages) vs AutoGPTQ
32
33
34
+
* 🚀 Added `Gemma 2` Model Support
33
35
* 🚀 Added `DeepSeek-V2` Model Support
34
36
* 🚀 Added `DeepSeek-V2-Lite` Model Support
35
37
* 🚀 Added `ChatGLM` Model Support
@@ -44,13 +46,14 @@ We will backport bug fixes to AutoGPTQ on a case-by-case basis.
44
46
* 🚀 Better quality quants as measured by PPL. (Test config: defaults + `sym=True` + `FORMAT.GPTQ`, TinyLlama)
45
47
* 🚀 Model weights sharding support
46
48
* 🚀 Security: hash check of model weights on load
49
+
* 🚀 Over 50% faster PPL calculations (OPT model)
47
50
* ✨ Alert users of sub-optimal calibration data. Most new users get this part horribly wrong.
48
-
* ✨ Increased compatiblity with newest models with auto-padding of in/out-features for [ Exllama, Exllama V2, Marlin ] backends.
49
-
* 👾 Fixed OPT quantization. Original OPT model code resulted in unusable quantized models.
51
+
* ✨ Increased compatibility with newest models with auto-padding of in/out-features for [ Exllama, Exllama V2, Marlin ] backends.
50
52
* 👾 Removed non-working, partially working, or fully deprecated features: Peft, ROCM, AWQ Gemm inference, Triton v1 (replaced by v2), Fused Attention (Replaced by Marlin/Exllama).
51
53
* 👾 <del>Fixed packing Performance regression on high core-count systems.</del> Backported to AutoGPTQ
52
54
* 👾 <del>Fixed crash on H100.</del> Backported to AutoGPTQ
53
-
* ✨ Many thousands of lines of refactor/cleanup.
55
+
* ✨ 10s of thousands of lines of refactor/cleanup.
56
+
* ✨ Over 8+ overly complex api args removed/merged into simple human-readable args.
54
57
* ✨ Added CI workflow for validation of future PRs and prevent code regressions.
55
58
* ✨ Added perplexity unit-test to prevent against model quant quality regressions.
56
59
* 👾 De-bloated 271K lines of which 250K was caused by a single dataset used only by an example.
@@ -72,14 +75,14 @@ We will backport bug fixes to AutoGPTQ on a case-by-case basis.
0 commit comments