Skip to content

quantize: add imatrix and dataset metadata in GGUF #6656

Closed
@phymbert

Description

@phymbert

Motivation

I was reading thanks to @julien-c this reddit post from @he29-net 👍

You can't easily tell whether a model was quantized with the help of importance matrix just from the name. I first found this annoying, because it was not clear if and how the calibration dataset affects performance of the model in other than just positive ways. But recent tests in llama.cpp discussion #5263 show, that while the data used to prepare the imatrix slightly affect how it performs in (un)related languages or specializations, any dataset will perform better than a "vanilla" quantization with no imatrix. So now, instead, I find it annoying because sometimes the only way to be sure I'm using the better imatrix version is to re-quantize the model myself.

Proposal

  • Add at the end of the imatrix binary file the dataset name on which the imatrix was computed on

  • Add following KV in quantize:

    • quantize.imatrix.file Filename of the provided imatrix during quantization
    • quantize.imatrix.entries_count Number of entries in the imatrix
    • quantize.imatrix.dataset Dataset from the imatrix
    • quantize.imatrix.chunks_count Number of chunks the imatrix was computed with

Ideally I would also add both imatrix and dataset files hashes in the metadata, but I am not sure this is supported and appropriate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgeneration qualityQuality of model outputmodelModel specificneed feedbackTesting and feedback with results are needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions