Upgrade v1/v2 format to v3 by leveraging quantize #1504

howard0su · 2023-05-17T15:44:26Z

Leverage quantize executable to support upgrade the models from v1 (previous) to v2 (latest).

Usage:
quantize <old_quantized_model> <new_mode_name> type

type must be match with the previous file type. The tool will not support re-quantize into another type.

github-actions

clang-tidy made some suggestions

ggml.c

llama.cpp

Green-Sky · 2023-05-17T16:27:38Z

I would not add it into ggml.c . It's legacy, which we don't want to carry around.

howard0su · 2023-05-17T23:23:40Z

No mean to carry it forever. maybe remove after couple of weeks. the data format (struct block_q4_0) is only defined in ggml.c. I don't see there is other way to do so unless we copy the definition.

rankaiyx · 2023-05-18T06:30:55Z

Maybe it can be made into a small independent software, so that it will not become a burden. Then modify the tips on the README.md by the way.

howard0su · 2023-05-18T09:05:31Z

The intention is having a more seamless experience when upgrading model version. It is not goal to have a seperate tool or maintain this longer term.

rankaiyx · 2023-05-18T09:26:33Z

The intention is having a more seamless experience when upgrading model version. It is not goal to have a seperate tool or maintain this longer term.

Thank you very much for making a lot of my old models useful again.

Unfortunately,Now there is a new merge that seems to break backward compatibility again.

In order to deal with the same thing happening again, it should be reasonable to provide a special tool.
Logically, upgrading the format is not a quantitative behavior.

howard0su · 2023-05-18T12:40:04Z

yes, it is fine to just keep this PR as a PR and don't merge. I will make some code change after F16 change merged.

daniandtheweb · 2023-05-18T14:28:27Z

Isn't it possible to integrate this as a separate tool? That way the legacy code could be kept away from the main program and the conversion would still be possible.

howard0su · 2023-05-20T14:06:14Z

You may notice the changes are in llama.cpp and ggml.c. If we want a new application, we pretty much copy the code.

SlyEcho · 2023-05-20T15:05:21Z

The quantization code is copied several times already, actually. One in ggml.c, then ggml-cuda.cu and also ggml-opencl.c as well.

howard0su · 2023-05-21T14:20:11Z

Tested with v1 & v2 file of Q4_0 only. I don't have other format file. Please report the bug here.

@ggerganov this is ugly patch but it works. It is so painful if we don't provide convert tool for the old models. But I don't have much time to build another tool (and I don't think it is worth the effort as an intermediate tool.)

github-actions

clang-tidy made some suggestions

ggml.c

llama.cpp

rankaiyx · 2023-05-21T15:04:38Z

There may be a compromise, that is, to create a fixed branch that contains the format conversion feature, which does not need to keep track of the latest code.
Then provide documentation on how to compile and use it in a reasonable place for those who need it.

github-actions bot reviewed May 17, 2023

View reviewed changes

ggml.c Outdated Show resolved Hide resolved

ggml.c Outdated Show resolved Hide resolved

llama.cpp Outdated Show resolved Hide resolved

howard0su marked this pull request as ready for review May 18, 2023 01:55

howard0su added 4 commits May 21, 2023 15:59

Upgrade v1 format to v2 by leveraging quantize

b8d6965

Support Q4_1

d521d09

Support more data types

10cbc31

Support V3 format upgrade

006d570

howard0su force-pushed the upgrade branch from ac03aed to 006d570 Compare May 21, 2023 14:18

howard0su changed the title ~~Upgrade v1 format to v2 by leveraging quantize~~ Upgrade v1/v2 format to v3 by leveraging quantize May 21, 2023

github-actions bot reviewed May 21, 2023

View reviewed changes

howard0su added 2 commits May 21, 2023 22:31

format fix

80f1faa

Remove trailing space

2257f9f

howard0su mentioned this pull request May 26, 2023

WHY WHY WHY ????? #1572

Closed

philpax mentioned this pull request May 31, 2023

ggml : unified file format ggml-org/ggml#220

Closed

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

Upgrade v1/v2 format to v3 by leveraging quantize #1504

Are you sure you want to change the base?

Upgrade v1/v2 format to v3 by leveraging quantize #1504

Conversation

howard0su commented May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Green-Sky commented May 17, 2023

Uh oh!

howard0su commented May 17, 2023

Uh oh!

rankaiyx commented May 18, 2023

Uh oh!

howard0su commented May 18, 2023

Uh oh!

rankaiyx commented May 18, 2023

Uh oh!

howard0su commented May 18, 2023

Uh oh!

daniandtheweb commented May 18, 2023

Uh oh!

howard0su commented May 20, 2023

Uh oh!

SlyEcho commented May 20, 2023

Uh oh!

howard0su commented May 21, 2023

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rankaiyx commented May 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

howard0su commented May 17, 2023 •

edited

Loading