Skip to content

Commit aa0296e

Browse files
perplexity: add BF16 vs. FP16 results
1 parent 83330d8 commit aa0296e

File tree

1 file changed

+41
-1
lines changed

1 file changed

+41
-1
lines changed

examples/perplexity/README.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ In addition to the KL divergence the following statistics are calculated with `-
3232

3333
## LLaMA 3 8b Scoreboard
3434

35-
Results are sorted by Kullback-Leibler divergence relative to FP16.
35+
Results were generated using the CUDA backend and are sorted by Kullback-Leibler divergence relative to FP16.
3636
The "WT" importance matrices were created using varying numbers of Wikitext tokens and can be found [here](https://huggingface.co/JohannesGaessler/llama.cpp_importance_matrices/blob/main/imatrix-llama_3-8b-f16-2.7m_tokens.dat).
3737

3838
| Quantization | imatrix | Model size [GiB] | PPL | ΔPPL | KLD | Mean Δp | RMS Δp |
@@ -89,6 +89,8 @@ K-quants score better on mean Δp than the legacy quants than e.g. KL divergence
8989

9090
## LLaMA 2 vs. LLaMA 3 Quantization comparison
9191

92+
Results were generated using the CUDA backend.
93+
9294
| Metric | L2 7b q2_K | L3 8b q2_K | L2 7b q4_K_M | L3 8b q4_K_M | L2 7b q6_K | L3 8b q6_K | L2 7b q8_0 | L3 8b q8_0 |
9395
|-----------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|
9496
| Mean PPL | 5.794552 ± 0.032298 | 9.751568 ± 0.063312 | 5.877078 ± 0.032781 | 6.407115 ± 0.039119 | 5.808494 ± 0.032425 | 6.253382 ± 0.038078 | 5.798542 ± 0.032366 | 6.234284 ± 0.037878 |
@@ -107,6 +109,44 @@ K-quants score better on mean Δp than the legacy quants than e.g. KL divergence
107109
| RMS Δp | 9.762 ± 0.053 % | 21.421 ± 0.079 % | 3.252 ± 0.024 % | 5.519 ± 0.050 % | 1.339 ± 0.010 % | 2.295 ± 0.019 % | 0.618 ± 0.011 % | 1.198 ± 0.007 % |
108110
| Same top p | 85.584 ± 0.086 % | 71.138 ± 0.119 % | 94.665 ± 0.055 % | 91.901 ± 0.072 % | 97.520 ± 0.038 % | 96.031 ± 0.051 % | 98.846 ± 0.026 % | 97.674 ± 0.040 % |
109111

112+
## LLaMA 3 BF16 vs. FP16 comparison
113+
114+
Results were generated using the CPU backend with LLaMA 3 8b BF16 as `--kl-divergence-base` and LLaMA 3 8b FP16 as the `--model` for comparison.
115+
116+
| Metric | Value |
117+
|--------------------------------|--------------------------|
118+
| Mean PPL(Q) | 6.227711 ± 0.037833 |
119+
| Mean PPL(base) | 6.225194 ± 0.037771 |
120+
| Cor(ln(PPL(Q)), ln(PPL(base))) | 99.990% |
121+
| Mean ln(PPL(Q)/PPL(base)) | 0.000404 ± 0.000086 |
122+
| Mean PPL(Q)/PPL(base) | 1.000404 ± 0.000086 |
123+
| Mean PPL(Q)-PPL(base) | 0.002517 ± 0.000536 |
124+
| Mean KLD | 0.00002515 ± 0.00000020 |
125+
| Maximum KLD | 0.012206 |
126+
| 99.9% KLD | 0.000799 |
127+
| 99.0% KLD | 0.000222 |
128+
| 99.0% KLD | 0.000222 |
129+
| Median KLD | 0.000013 |
130+
| 10.0% KLD | -0.000002 |
131+
| 5.0% KLD | -0.000008 |
132+
| 1.0% KLD | -0.000023 |
133+
| Minimum KLD | -0.000059 |
134+
| Mean Δp | -0.0000745 ± 0.0003952 % |
135+
| Maximum Δp | 4.186% |
136+
| 99.9% Δp | 1.049% |
137+
| 99.0% Δp | 0.439% |
138+
| 95.0% Δp | 0.207% |
139+
| 90.0% Δp | 0.125% |
140+
| 75.0% Δp | 0.029% |
141+
| Median Δp | 0.000% |
142+
| 25.0% Δp | -0.030% |
143+
| 10.0% Δp | -0.126% |
144+
| 5.0% Δp | -0.207% |
145+
| 1.0% Δp | -0.434% |
146+
| 0.1% Δp | -1.016% |
147+
| Minimum Δp | -4.672% |
148+
| RMS Δp | 0.150 ± 0.001 % |
149+
| Same top p | 99.739 ± 0.013 % |
110150

111151
## Old Numbers
112152

0 commit comments

Comments
 (0)