-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Converting GGML->GGUF: ValueError: Only GGJTv3 supported #2990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There are versions of GGML that had really strange, difficult to support stuff like multi-part files, including individual tensors split across (or duplicated) across the files, etc. So supporting all versions of the previous GGML formats definitely isn't easy or simple. It also looks like you're converting without the model metadata, and converting the vocabulary also isn't perfect. Even if you mess around with this, you're going to get a model that's lower quality than one that was directly converted to GGUF. Is it really impractical for you to just download the GGUF version? edit: Also, do you know what version the file actually is? |
If everyone needs to re-download 200GB+ every time there's a schema change
I hate to think of the egress costs to the org but I guess that's the way
for now.
Any documentation for the header info that I might be able to write an info
helper app?
Thx
…On Sun, Sep 3, 2023, 14:36 Kerfuffle ***@***.***> wrote:
My GGML converted models should be easy to convert to GGUF.
There are versions of GGML that had really strange, difficult to support
stuff like multi-part files, including individual tensors split across (or
duplicated) across the files, etc. So supporting all versions of the
previous GGML formats definitely isn't easy or simple.
It also looks like you're converting without the model metadata, and
converting the vocabulary also isn't perfect. Even if you mess around with
this, you're going to get a model that's lower quality than one that was
directly converted to GGUF.
Is it really impractical for you to just download the GGUF version?
—
Reply to this email directly, view it on GitHub
<#2990 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZP3DHJTTZMAG2WZFGMSVDXYSBWVANCNFSM6AAAAAA4JHU6P4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
For those of us who use 30b/70B models yes. It very much is impractical to download 40GB over and over again. To download unquantized models it's also impractical because they are hundreds of GB. If you are downloading 10 - 20 models over time this is virtually impossible due to data caps and internet speeds. |
Not really, kind of have to just figure it out by looking at the loading code for different versions in the projects that have supported GGML. You didn't answer my question about what version you have. If you can load it with an older llama.cpp version I think it will say what it is when it gets loaded. Or if you can show me the first 10 or so bytes in a hexdump. For example on Linux something like:
Just to be clear, I wasn't saying that in a snarky way. People tend to just make issues when they run into a problem, even if there's a relatively easy workaround. Since converting the GGML models isn't ideal in the first place, I was just checking to make sure there wasn't an easier way to deal with this. |
I don't wanna hijack but I have a similar problem now. I converted a bunch of GGML to GGUF and they worked fine. Now I directly downloaded 2 quants in GGUF and am getting huge repetition problems at long context. I also have the GPTQ version and these issues aren't present. It isn't the model. So now, I'm stuck downloading a GGML of the same and converting it to see if that will work. But this means that I have downloaded over 120GB of the same model, not counting the GPTQ. All due to format. At the end, I also don't know if the other models are GGUFv1 or GGUFv2 from either a direct quant or conversion scripts. GGUFv1 goes away in a month. Will they convert after? Will some other strange bug like this occur? You can see how this is frustrating, right? |
I had that same issue with massive repetition... I thought it was just a
fluke but maybe I'm not the only one. Anyway that's probably for a separate
issue but I'll do my best to write up a quick version/info app as there
doesn't seem to be any. Good tip on loading with an old build.
John
…On Sun, Sep 3, 2023, 17:19 Forkoz ***@***.***> wrote:
I don't wanna hijack but I have a similar problem now. I converted a bunch
of GGML to GGUF and they worked fine. Now I directly downloaded 2 quants in
GGUF and am getting huge repetition problems at long context. I also have
the GPTQ version and these issues aren't present. It isn't the model.
So now, I'm stuck downloading a GGML of the same and converting it to see
if that will work. But this means that I have downloaded over 120GB of the
same model, not counting the GPTQ. All due to format.
At the end, I also don't know if the other models are GGUFv1 or GGUFv2
from either a direct quant or conversion scripts. GGUFv1 goes away in a
month. Will they convert after? Will some other strange bug like this
occur? You can see how this is frustrating, right?
—
Reply to this email directly, view it on GitHub
<#2990 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZP3DEFNP2HJKG5O4LVTOTXYSUXZANCNFSM6AAAAAA4JHU6P4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Did you convert using metadata? (The If not, that might be something to test: see if doing the GGML conversion with metadata leads to the repetition problem you mentioned.
I got your back with #2931. You can now use the
The file will start with:
That's why I was asking you for a hexdump of the beginning of the file. (By the way, I'm the person who wrote the I think the main difference in GGMF from GGML was that the version was added and vocabulary items added score (a You can possibly try changing: if bytes(data[offset:offset + 4]) != b'tjgg' or struct.unpack('<I', data[offset + 4:offset + 8])[0] != 3: to if bytes(data[offset:offset + 4]) != b'tjgg': (should be around line 120 in the script) This will just completely ignore the version (still only supports GGJT) and blindly attempt to proceed. It may or may not actually work. |
I didn't use metadata. I only used the script. I'll also try the repeating models in GGUFv2 because why not. I won't have a GGML copy of the model to attempt to convert until tomorrow. |
You shouldn't notice any difference. The only thing GGUFv2 did (as far as I know) is change some types to 64bit to allow expressing larger values. Existing files wouldn't have any values over 32bit so there wouldn't be a visible difference.
Alright. Well, if you notice that converting with metadata leads to the repetition issue but converting using only the GGML file doesn't that would mean it's something with the new vocab stuff. GGUF is supposed to be better in that regard though and converting GGML to GGUF without the metadata is imperfect. So it's likely to be worse than the original GGML file in terms of quality. |
Does anyone know how to solve the problem: I had already successfully converted GGML to GGUF last week, but I updated the llama.cpp and now I get this error. |
What? The GGML to GGUF conversion script has only ever supported GGJTv3. Maybe you successfully converted a GGJTv3 file and then tried to convert a GGML file of a different version (non GGJTv3). As for possibly ways to deal with that, please read through the other posts in this issue. I can't help you if I don't know the version of the GGML file you're trying to convert. |
Added a magic file:
#3011
…On Mon, Sep 4, 2023 at 2:27 AM Kerfuffle ***@***.***> wrote:
I had already successfully converted GGML to GGUF last week
What? The GGML to GGUF conversion script has only ever supported GGJTv3.
Maybe you successfully converted a GGJTv3 file and then tried to convert a
GGML file of a different version (non GGJTv3). As for possibly ways to deal
with that, please read through the other posts in this issue.
I can't help you if I don't know the version of the GGML file you're
trying to convert.
—
Reply to this email directly, view it on GitHub
<#2990 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZP3DAWYQLNBCSI73JD4CTXYUU7NANCNFSM6AAAAAA4JHU6P4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Got the K5M GGML. Running it as GGML it still does some repetition. I also converted it to GGUF (v1/v2 makes no difference). The extreme repetition where it generates the same thing over and over happens about 1000 tokens later in the converted model. It can now be broken via mirostat with high TAU. You'll still get your bits of the previous messages but at least now the plot moves forward. Every message may start with the robot smiles slyly but the rest of the contents will be different. Something is definitely wrong with how this was quantized. The GPTQ model had none of these problems. I will try to run it with pure exllama so that there are less samplers for a final check. I'm not sure what else to do or why it would be such a big difference between the two formats or why llama.cpp performs so poorly in this department. So far none of my other models have suffered from this and GGUF/GGML have been smarter than their GPTQ equivalents. Model is: https://huggingface.co/nRuaif/fiction.live-Kimiko-V2-70B and I'm using the bloke's quants. |
Getting a bit off topic here, but...
It may just be random. Any quantization is going to cause some kind of degradation, some particular models may just get hit in a particularly critical way. Nothing really has changed recently with the quantization that I'm aware of. The only thing I can think of is that k-quants actually quantizes a couple tensors with higher quality than it used to for 70B LLaMA2 models specifically. This is because of the multi-attention stuff those tensors are smaller than they were in LLaMAv1 so more bits can be spent on them.
Going even further off topic, you should try out my seqrep sampler in #2593, it's specifically designed to try to help with that kind of stuff. Try parameters like:
and for seqrep
Note: I haven't tested this stuff out for use models in chat or roleplay type modes. I think any type of repetition penalties are going to struggle there because there's going to be a lot of repetition in stuff like |
Dang, I want to try that but I'm using textgen with python bindings as a backend and then doing the chats through silly tavern. I don't have any of these problems with platypus, qcamel and a couple of other 70b and I use the exact same settings and prompts. All quants; QK4M, QK5S, Q6. I mean not even a hint of it. I can load the GPTQ model up in the same broken chat and it immediately ends the repetition, even with plain exllama with the same low amount of samplers. Loading another GGUF also immediately ends it, even if it went on for many messages. I have all airoboros, mostly in lora form that I apply to GPTQ models. I'd love to have done the same thing for GGUF but offloaded and quantized models won't take lora. My main worry is that I'll d/l another 70b as GGUF instead of GGML for better quants and then be hit with this issue. Plus it's sad that I won't be able to use this one in it's smarter GGUF form. |
It is GGMLv3 |
I don't think there is such a thing, that's why I asked for a hex dump from the beginning of the file: #2990 (comment) edit: I guess I can't criticize saying "GGML v3" too much since I called the script that after all. Unfortunately it's not really accurate enough to know exactly what the format is. I could have called it "GGJT v3" but generally users wouldn't know internal details like the current GGML format is actually GGJT. Digging through the older code, it looks like these are the options: https://github.com/ggerganov/llama.cpp/blob/dadbed99e65252d79f81101a392d0d6497b86caa/llama.cpp#L506-L512 Based on that, it's probably possibly to convert any of those versions all the way back to plain For quantized files, it looks like it's only possible to go back to GGJT v2 and in that case only if it's not q8 or q4 (I assume this means
Ah, I can't really help you there and I also suspect it wouldn't work as well for a chat format compared to something like "Write a story with blah, blah, blah criteria". |
@danielbrdz @jboero Please try converting with #3023 - that version should convert even very old GGML format files when it's possible. In cases where conversion isn't possible, it should give you a better error message. I actually don't have any GGML files older than GGJTv3 laying around so I'd appreciate any testing with older files. Please note that in cases where the quantization format changed it's just not possible to convert the file. So if your GGML isn't f16 or f32 format and it's older than GGJTv2 it just can't be converted. If it's GGJTv2 and Q8 or Q4 quantized then it also can't be converted since the format for those quantizations changed in GGJTv3. Even for those files that can't be converted, it would be helpful if people can test and report back. You should get a reasonable error message when the file can't be converted, like:
It should also report the file format when loading to enable better reporting of problems:
|
Awesome thank you so much. I'm on travel for a few days but will try as
soon as I get back to my workstation.
…On Tue, Sep 5, 2023, 10:48 Kerfuffle ***@***.***> wrote:
@danielbrdz <https://github.com/danielbrdz> @jboero
<https://github.com/jboero> Please try converting with #3023
<#3023> - that version should
convert even very old GGML format files when it's possible. In cases where
conversion isn't possible, it should give you a better error message.
I actually don't have any GGML files older than GGJTv3 laying around so
I'd appreciate any testing with older files.
Please note that in cases where the quantization format changed it's just
not possible to convert the file. So if your GGML isn't f16 or f32 format
and it's older than GGJTv2 it just can't be converted. If it's GGJTv2 and
Q8 or Q4 quantized then it also can't be converted since the format for
those quantizations changed it GGJTv3.
Even for those files that can't be converted, it would be helpful if
people can test and report back. You should get a reasonable error message
when the file can't be converted, like:
ValueError: Q4 and Q8 quantizations changed in GGJTv3. Sorry, your GGJTv2 file of type MOSTLY_Q8_0 is not eligible for conversion.
It should also report the file format when loading to enable better
reporting of problems:
* Scanning GGML input file
* File format: GGJTv3 with ftype MOSTLY_Q8_0
—
Reply to this email directly, view it on GitHub
<#2990 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZP3DEFH727L7V657IDLM3XY3YPBANCNFSM6AAAAAA4JHU6P4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Just a note to people reading this, I renamed the script to |
I have solved the error: ValueError: Only GGJTv3 supported As I told you before, last week I made the conversion from GGML to GGUF without problems. This way I could solve the error, I have the theory that the error is due to when converting a HF model to GGML in the latest version of llama.cpp as it has residues of GGUF it could be mixed and that is the reason for the incompatibility that marks the error commented in the forum. |
Amazing. I still haven't gotten back to my workstation to test yet. Thank
you.
…On Wed, Sep 6, 2023, 09:49 Kerfuffle ***@***.***> wrote:
Closed #2990 <#2990> as
completed via #3023 <#3023>.
—
Reply to this email directly, view it on GitHub
<#2990 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZP3DCNSOEIJUNIOCY2MP3XZA2JJANCNFSM6AAAAAA4JHU6P4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I'm confused and not sure I understand correctly. If you have the HF model, why are you converting to GGML and then converting the GGML to GGUF instead of just converting from HF to GGUF directly?
There isn't a way to convert LLaMA models from HF to GGML anymore in the latest llama.cpp, as far as I know. The current I added the conversion script (now renamed to |
Answering your first question, I have converted from HF to GGUF directly on several occasions, the problem for me is that to do that conversion directly you can only do it for f16, f32 and I think also q8_0. Those formats are too heavy for my computer and I need them in Q4_K_M or Q5_K_M to run it locally on my PC. I don't think I can do it directly from HF to GGUF, but I have to go through GGML to have it in the format I want.
Regarding this, you can convert HF to GGML in the last update of llama.cpp, you can do it by going to the "example" folder and use the script called "make-ggml.py" and you can convert it to GGML, I used it and you still can, just for some reason you can't convert it to GGUF anymore. |
The intended workflow is to use |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
My GGML converted models should be easy to convert to GGUF.
I know the conversion tools aren't guaranteed but I'd like to file this one in case anybody else has a workaround or more version flexible option. I would love to see any version of GGML/GGJT supported if possible. Instead my GGML files converted earlier are apparently not supported for conversion to GGUF.
Is there any tool to show the standard version details of a model file? Happy to contribute one if there isn't.
Current Behavior
Environment and Context
Working with models
Physical Fedora 38, probably irrelevant give the Python.
$ lscpu
$ uname -a
Linux z840 6.4.12-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Aug 23 17:46:49 UTC 2023 x86_64 GNU/Linux
Failure Information (for bugs)
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
The text was updated successfully, but these errors were encountered: