File tree 3 files changed +64
-0
lines changed 3 files changed +64
-0
lines changed Original file line number Diff line number Diff line change @@ -597,6 +597,11 @@ Several quantization methods are supported. They differ in the resulting model d
597
597
| 13B | ms/ tok @ 8th | - | 73 | 82 | 98 | 105 | 128 |
598
598
| 13B | bits/ weight | 16.0 | 4.5 | 5.0 | 5.5 | 6.0 | 8.5 |
599
599
600
+ - [k- quants](https: // github.com/ggerganov/llama.cpp/pull/1684)
601
+ - recent k- quants improvements
602
+ - [#2707 ](https: // github.com/ggerganov/llama.cpp/pull/2707)
603
+ - [#2807 ](https: // github.com/ggerganov/llama.cpp/pull/2807)
604
+
600
605
### Perplexity (measuring model quality)
601
606
602
607
You can use the `perplexity` example to measure perplexity over a given prompt (lower perplexity is better).
Original file line number Diff line number Diff line change 1
1
# perplexity
2
2
3
3
TODO
4
+
5
+ ## Llama 2 70B Scorechart
6
+ Quantization | Model size (GiB) | Perplexity | Delta to fp16
7
+ -- | -- | -- | --
8
+ Q4_0 | 36.20 | 3.5550 | 3.61%
9
+ Q4_1 | 40.20 | 3.5125 | 2.37%
10
+ Q5_0 | 44.20 | 3.4744 | 1.26%
11
+ Q2_K | 27.27 | 3.7339 | 8.82%
12
+ Q3_K_S | 27.86 | 3.7019 | 7.89%
13
+ Q3_K_M | 30.83 | 3.5932 | 4.72%
14
+ Q3_K_L | 33.67 | 3.5617 | 3.80%
15
+ Q4_K_S | 36.39 | 3.4852 | 1.57%
16
+ Q4_K_M | 38.54 | 3.4725 | 1.20%
17
+ Q5_K_S | 44.20 | 3.4483 | 0.50%
18
+ Q5_K_M | 45.41 | 3.4451 | 0.40%
19
+ Q6_K | 52.70 | 3.4367 | 0.16%
20
+ fp16 | 128.5 | 3.4313 | -
21
+
Original file line number Diff line number Diff line change 1
1
# quantize
2
2
3
3
TODO
4
+
5
+ ## Llama 2 7B
6
+
7
+ Quantization | Bits per Weight (BPW)
8
+ -- | --
9
+ Q2_K | 3.35
10
+ Q3_K_S | 3.50
11
+ Q3_K_M | 3.91
12
+ Q3_K_L | 4.27
13
+ Q4_K_S | 4.58
14
+ Q4_K_M | 4.84
15
+ Q5_K_S | 5.52
16
+ Q5_K_M | 5.68
17
+ Q6_K | 6.56
18
+
19
+ ## Llama 2 13B
20
+ Quantization | Bits per Weight (BPW)
21
+ -- | --
22
+ Q2_K | 3.34
23
+ Q3_K_S | 3.48
24
+ Q3_K_M | 3.89
25
+ Q3_K_L | 4.26
26
+ Q4_K_S | 4.56
27
+ Q4_K_M | 4.83
28
+ Q5_K_S | 5.51
29
+ Q5_K_M | 5.67
30
+ Q6_K | 6.56
31
+
32
+ # Llama 2 70B
33
+
34
+ Quantization | Bits per Weight (BPW)
35
+ -- | --
36
+ Q2_K | 3.40
37
+ Q3_K_S | 3.47
38
+ Q3_K_M | 3.85
39
+ Q3_K_L | 4.19
40
+ Q4_K_S | 4.53
41
+ Q4_K_M | 4.80
42
+ Q5_K_S | 5.50
43
+ Q5_K_M | 5.65
44
+ Q6_K | 6.56
You can’t perform that action at this time.
0 commit comments