Skip to content

Conversation

@howard0su
Copy link
Contributor

@howard0su howard0su commented May 17, 2023

Leverage quantize executable to support upgrade the models from v1 (previous) to v2 (latest).

Usage:
quantize <old_quantized_model> <new_mode_name> type

type must be match with the previous file type. The tool will not support re-quantize into another type.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@Green-Sky
Copy link
Collaborator

I would not add it into ggml.c . It's legacy, which we don't want to carry around.

@howard0su
Copy link
Contributor Author

No mean to carry it forever. maybe remove after couple of weeks. the data format (struct block_q4_0) is only defined in ggml.c. I don't see there is other way to do so unless we copy the definition.

@howard0su howard0su marked this pull request as ready for review May 18, 2023 01:55
@rankaiyx
Copy link

Maybe it can be made into a small independent software, so that it will not become a burden. Then modify the tips on the README.md by the way.

@howard0su
Copy link
Contributor Author

The intention is having a more seamless experience when upgrading model version. It is not goal to have a seperate tool or maintain this longer term.

@rankaiyx
Copy link

The intention is having a more seamless experience when upgrading model version. It is not goal to have a seperate tool or maintain this longer term.

Thank you very much for making a lot of my old models useful again.

Unfortunately,Now there is a new merge that seems to break backward compatibility again.

In order to deal with the same thing happening again, it should be reasonable to provide a special tool.
Logically, upgrading the format is not a quantitative behavior.

@howard0su
Copy link
Contributor Author

yes, it is fine to just keep this PR as a PR and don't merge. I will make some code change after F16 change merged.

@daniandtheweb
Copy link
Contributor

Isn't it possible to integrate this as a separate tool? That way the legacy code could be kept away from the main program and the conversion would still be possible.

@howard0su
Copy link
Contributor Author

You may notice the changes are in llama.cpp and ggml.c. If we want a new application, we pretty much copy the code.

@SlyEcho
Copy link
Contributor

SlyEcho commented May 20, 2023

The quantization code is copied several times already, actually. One in ggml.c, then ggml-cuda.cu and also ggml-opencl.c as well.

@howard0su howard0su changed the title Upgrade v1 format to v2 by leveraging quantize Upgrade v1/v2 format to v3 by leveraging quantize May 21, 2023
@howard0su
Copy link
Contributor Author

Tested with v1 & v2 file of Q4_0 only. I don't have other format file. Please report the bug here.

@ggerganov this is ugly patch but it works. It is so painful if we don't provide convert tool for the old models. But I don't have much time to build another tool (and I don't think it is worth the effort as an intermediate tool.)

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@rankaiyx
Copy link

There may be a compromise, that is, to create a fixed branch that contains the format conversion feature, which does not need to keep track of the latest code.
Then provide documentation on how to compile and use it in a reasonable place for those who need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants