Description
Motivation
I was reading thanks to @julien-c this reddit post from @he29-net 👍
You can't easily tell whether a model was quantized with the help of importance matrix just from the name. I first found this annoying, because it was not clear if and how the calibration dataset affects performance of the model in other than just positive ways. But recent tests in llama.cpp discussion #5263 show, that while the data used to prepare the imatrix slightly affect how it performs in (un)related languages or specializations, any dataset will perform better than a "vanilla" quantization with no imatrix. So now, instead, I find it annoying because sometimes the only way to be sure I'm using the better imatrix version is to re-quantize the model myself.
Proposal
-
Add at the end of the
imatrix
binary file the dataset name on which the imatrix was computed on -
Add following KV in
quantize
:quantize.imatrix.file
Filename of the provided imatrix during quantizationquantize.imatrix.entries_count
Number of entries in the imatrixquantize.imatrix.dataset
Dataset from the imatrixquantize.imatrix.chunks_count
Number of chunks the imatrix was computed with
Ideally I would also add both imatrix and dataset files hashes in the metadata, but I am not sure this is supported and appropriate.