Skip to content

Commit 26b3dc0

Browse files
authored
[DOC] Release 0.9.3 (#150)
* release 0.9.3 * update
1 parent 83c002d commit 26b3dc0

File tree

2 files changed

+18
-15
lines changed

2 files changed

+18
-15
lines changed

README.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,14 @@
1010
</p>
1111

1212
## News
13+
* 07/02/2024 🚀 [v0.9.3](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.3): Added Gemma 2 support, faster PPL calculations on gpu, and more code/arg refractor.
1314

14-
* 06/30/2024 🚀 [v0.9.2](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.2) released. Added auto-padding of model in/out-features for exllama, exllama v2, marlin.
15+
* 06/30/2024 🚀 [v0.9.2](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.2): Added auto-padding of model in/out-features for exllama, exllama v2, marlin.
1516
Fixed quantization of OPT and DeepSeek V2-Lite models. Fixed inference for DeepSeek V2-Lite.
1617

17-
* 06/29/2024 🚀🚀🚀 [v0.9.1](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.1) released. With 3 new models (DeepSeek-V2, DeepSeek-V2-Lite, DBRX Converted), BITBLAS new format/kernel, proper batching of calibration dataset resulting > 50% quantization speedup, security hash check of loaded model weights, tons of refractor/usability improvements, bugs fixes and much more.
18+
* 06/29/2024 🚀🚀🚀 [v0.9.1](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.1): With 3 new models (DeepSeek-V2, DeepSeek-V2-Lite, DBRX Converted), BITBLAS new format/kernel, proper batching of calibration dataset resulting > 50% quantization speedup, security hash check of loaded model weights, tons of refractor/usability improvements, bugs fixes and much more.
1819

19-
* 06/20/2924 ✨ GPTQModel [v0.9.0](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.0) released. Thanks for all the work from ModelCloud team and the opensource ML community for their contributions!
20+
* 06/20/2924 ✨ GPTQModel [v0.9.0](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.0): Thanks for all the work from ModelCloud team and the opensource ML community for their contributions!
2021

2122
## Mission Statement
2223

@@ -30,6 +31,7 @@ We will backport bug fixes to AutoGPTQ on a case-by-case basis.
3031

3132
## Major Changes (Advantages) vs AutoGPTQ
3233

34+
* 🚀 Added `Gemma 2` Model Support
3335
* 🚀 Added `DeepSeek-V2` Model Support
3436
* 🚀 Added `DeepSeek-V2-Lite` Model Support
3537
* 🚀 Added `ChatGLM` Model Support
@@ -44,13 +46,14 @@ We will backport bug fixes to AutoGPTQ on a case-by-case basis.
4446
* 🚀 Better quality quants as measured by PPL. (Test config: defaults + `sym=True` + `FORMAT.GPTQ`, TinyLlama)
4547
* 🚀 Model weights sharding support
4648
* 🚀 Security: hash check of model weights on load
49+
* 🚀 Over 50% faster PPL calculations (OPT model)
4750
* ✨ Alert users of sub-optimal calibration data. Most new users get this part horribly wrong.
48-
* ✨ Increased compatiblity with newest models with auto-padding of in/out-features for [ Exllama, Exllama V2, Marlin ] backends.
49-
* 👾 Fixed OPT quantization. Original OPT model code resulted in unusable quantized models.
51+
* ✨ Increased compatibility with newest models with auto-padding of in/out-features for [ Exllama, Exllama V2, Marlin ] backends.
5052
* 👾 Removed non-working, partially working, or fully deprecated features: Peft, ROCM, AWQ Gemm inference, Triton v1 (replaced by v2), Fused Attention (Replaced by Marlin/Exllama).
5153
* 👾 <del>Fixed packing Performance regression on high core-count systems.</del> Backported to AutoGPTQ
5254
* 👾 <del>Fixed crash on H100.</del> Backported to AutoGPTQ
53-
* ✨ Many thousands of lines of refactor/cleanup.
55+
* ✨ 10s of thousands of lines of refactor/cleanup.
56+
* ✨ Over 8+ overly complex api args removed/merged into simple human-readable args.
5457
* ✨ Added CI workflow for validation of future PRs and prevent code regressions.
5558
* ✨ Added perplexity unit-test to prevent against model quant quality regressions.
5659
* 👾 De-bloated 271K lines of which 250K was caused by a single dataset used only by an example.
@@ -72,14 +75,14 @@ We will backport bug fixes to AutoGPTQ on a case-by-case basis.
7275

7376
| Model | | | | | | | |
7477
|----------------|----|------------------|----|-----------|----|------------|----|
75-
| Baichuan || DeepSeek-V2-Lite | 🚀 | LongLLaMA || Phi-3 | 🚀 |
76-
| Bloom || Falon || MiniCPM | 🚀 | Qwen ||
77-
| ChatGLM | 🚀 | GPTBigCod | | Mistral | | Qwen2MoE | 🚀 |
78-
| CodeGen || GPTNeoX || Mixtral || RefinedWeb ||
79-
| Cohere || GPT-2 || MOSS || StableLM ||
80-
| DBRX Converted | 🚀 | GPT-J || MPT || StarCoder2 ||
81-
| Deci || InternLM || OPT || XVERSE ||
82-
| DeepSeek-V2 | 🚀 | Llama || Phi || Yi ||
78+
| Baichuan || DeepSeek-V2-Lite | 🚀 | Llama || Phi/Phi-3 | 🚀 |
79+
| Bloom || Falon || LongLLaMA | | Qwen ||
80+
| ChatGLM | 🚀 | Gemma 2 | 🚀 | MiniCPM | 🚀 | Qwen2MoE | 🚀 |
81+
| CodeGen || GPTBigCod || Mistral || RefinedWeb ||
82+
| Cohere || GPTNeoX || Mixtral || StableLM ||
83+
| DBRX Converted | 🚀 | GPT-2 || MOSS || StarCoder2 ||
84+
| Deci || GPT-J || MPT || XVERSE ||
85+
| DeepSeek-V2 | 🚀 | InternLM || OPT || Yi ||
8386

8487
## Compatiblity
8588

gptqmodel/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.9.3-dev0"
1+
__version__ = "0.9.3"

0 commit comments

Comments
 (0)