You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/perplexity/README.md
+59-1Lines changed: 59 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,8 @@ Also note that finetunes typically result in a higher perplexity value even thou
7
7
8
8
Within llama.cpp the perplexity of base models is used primarily to judge the quality loss from e.g. quantized models vs. FP16.
9
9
The convention among contributors is to use the Wikitext-2 test set for testing unless noted otherwise (can be obtained with `scripts/get-wikitext-2.sh`).
10
+
When numbers are listed all command line arguments and compilation options are left at their defaults unless noted otherwise.
11
+
llama.cpp numbers are **not** directly comparable to those of other projects because the exact values depend strongly on the implementation details.
10
12
11
13
By default only the mean perplexity value and the corresponding uncertainty is calculated.
12
14
The uncertainty is determined empirically by assuming a Gaussian distribution of the "correct" logits per and then applying error propagation.
@@ -32,7 +34,13 @@ In addition to the KL divergence the following statistics are calculated with `-
32
34
33
35
## LLaMA 3 8b Scoreboard
34
36
35
-
Results are sorted by Kullback-Leibler divergence relative to FP16.
37
+
| Revision | f364eb6f |
38
+
|:---------|:-------------------|
39
+
| Backend | CUDA |
40
+
| CPU | AMD Epyc 7742 |
41
+
| GPU | 1x NVIDIA RTX 4090 |
42
+
43
+
Results were generated using the CUDA backend and are sorted by Kullback-Leibler divergence relative to FP16.
36
44
The "WT" importance matrices were created using varying numbers of Wikitext tokens and can be found [here](https://huggingface.co/JohannesGaessler/llama.cpp_importance_matrices/blob/main/imatrix-llama_3-8b-f16-2.7m_tokens.dat).
37
45
38
46
| Quantization | imatrix | Model size [GiB]| PPL | ΔPPL | KLD | Mean Δp | RMS Δp |
@@ -89,6 +97,12 @@ K-quants score better on mean Δp than the legacy quants than e.g. KL divergence
0 commit comments