Skip to content

Silently failing ggml to gguf conversion #2697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
staviq opened this issue Aug 21, 2023 · 13 comments · Fixed by #2698
Closed

Silently failing ggml to gguf conversion #2697

staviq opened this issue Aug 21, 2023 · 13 comments · Fixed by #2698

Comments

@staviq
Copy link
Contributor

staviq commented Aug 21, 2023

@KerfuffleV2
#2398

convert-llama-ggmlv3-to-gguf.py produces a model without errors, which causes unexpected behaviour in main and server

I believe the model is this one from here

Source model md5 e87520b6393ea5ed6f9419e9fe6aba96 mythomax-l2-13b.ggmlv3.q5_K_M.bin
Resulting model md5 ce6cf60b707cb21fc04ac0e6cf6a147e mythomax-l2-13b.ggmlv3.q5_K_M.gguf

Exact command with output:

python3 convert-llama-ggmlv3-to-gguf.py -i /storage/models/mythomax-l2-13b.ggmlv3.q5_K_M.bin -o /storage/models/mythomax-l2-13b.ggmlv3.q5_K_M.gguf --eps 1e-5 -c 4096
* Using config: Namespace(input=PosixPath('/storage/models/mythomax-l2-13b.ggmlv3.q5_K_M.bin'), output=PosixPath('/storage/models/mythomax-l2-13b.ggmlv3.q5_K_M.gguf'), name=None, desc=None, gqa=1, eps='1e-5', context_length=4096, model_metadata_dir=None, vocab_dir=None, vocabtype='spm')

=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===

* Scanning GGML input file
* GGML model hyperparameters: <Hyperparameters: n_vocab=32000, n_embd=5120, n_mult=6912, n_head=40, n_layer=40, n_rot=128, n_ff=13824, ftype=17>

=== WARNING === Special tokens may not be converted correctly. Use --model-metadata-dir if possible === WARNING ===

* Preparing to save GGUF file
* Adding model parameters and KV items
* Adding 32000 vocab item(s)
* Adding 363 tensor(s)
    gguf: write header
    gguf: write metadata
    gguf: write tensors
* Successful completion. Output saved to: /storage/models/mythomax-l2-13b.ggmlv3.q5_K_M.gguf
@KerfuffleV2
Copy link
Collaborator

Thank you. Can you please try the fix here? #2698

Just change line 240 of the conversion script from

vbytes = bytes(f'<0x{hv}>', encoding = 'UTF-8')

to

vbytes = bytes(f'<0x{vbytes[0]:02X}>', encoding = 'UTF-8')

@KerfuffleV2 KerfuffleV2 mentioned this issue Aug 21, 2023
@staviq
Copy link
Contributor Author

staviq commented Aug 21, 2023

vbytes = bytes(f'<0x{vbytes[0]:02X}>', encoding = 'UTF-8')

At first glance, this appears to have fixed it, tested with server:

User: hi
Llama: Hello there! How can I assist you today?
User: Tell me a joke
Llama: Why did the tomato turn red? Because it saw the salad dressing!
User: Another one
Llama: What do you call a fake noodle? An impasta!

Resultant model md5 after fix: fcf16d638dc53d4bec7e827ee71192de mythomax-l2-13b.ggmlv3.q5_K_M.gguf

main appears to be working as well after the fix:

# ./build/bin/main -m /storage/models/mythomax-l2-13b.ggmlv3.q5_K_M.gguf -p "Llamas are" -n 64
(...)
llama_model_load_internal: mem required  = 9167.74 MB (+  400.00 MB per state)
llama_model_load_internal: offloading 0 repeating layers to GPU
llama_model_load_internal: offloaded 0/43 layers to GPU
llama_model_load_internal: total VRAM used: 360 MB
llama_new_context_with_model: kv self size  =  400.00 MB

system_info: n_threads = 3 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 64, n_keep = 0


 Llamas are cute and all, but can they really save the environment? New research suggests that these South American camelids have surprising abilities when it comes to reducing greenhouse gas emissions.
The study, published in the journal Nature Sustainability, looked at the benefits of llamas and alpacas
llama_print_timings:        load time =   648.60 ms
llama_print_timings:      sample time =    32.29 ms /    64 runs   (    0.50 ms per token,  1982.04 tokens per second)
llama_print_timings: prompt eval time =  1124.14 ms /     5 tokens (  224.83 ms per token,     4.45 tokens per second)
llama_print_timings:        eval time = 18461.69 ms /    63 runs   (  293.04 ms per token,     3.41 tokens per second)
llama_print_timings:       total time = 19627.86 ms

@KerfuffleV2
Copy link
Collaborator

Great, thanks for the repeat and testing! I will try to get the fix merged as soon as possible.

@ghost
Copy link

ghost commented Aug 21, 2023

Hmm, is it possible to convert ggml to gguf on mobile? or are you converting on PC?

@KerfuffleV2
Copy link
Collaborator

Hmm, is it possible to convert ggml to gguf on mobile?

I think they just meant they were currently not at their computer and posting from mobile.

But you can set up a Unix environment on Android phones pretty easily and run stuff like Python scripts, compilers. You can even compile/run llama.cpp on mobile device, though it's not gonna be super fast.

@staviq
Copy link
Contributor Author

staviq commented Aug 22, 2023

Hmm, is it possible to convert ggml to gguf on mobile? or are you converting on PC?

I think they just meant they were currently not at their computer and posting from mobile.

Pretty much yes, to all of that.

I was momentarily not in my house. You could actually run any part of llama.cpp on Android at least. If you have Termux or equivalent, you get plain old Linux shell and all that comes with it.

@KerfuffleV2
Copy link
Collaborator

We should be good now. Please let me know if you have any further issues with converted models!

@ghost
Copy link

ghost commented Aug 22, 2023

I think they just meant they were currently not at their computer and posting from mobile.

Yeah, probably.

But you can set up a Unix environment on Android phones pretty easily and run stuff like Python scripts, compilers. You can even compile/run llama.cpp on mobile device, though it's not gonna be super fast.

I run llama.cpp in termux daily. I've got python/numpy installed, but got some strange error that doesn't recognise numpy:

~/llama.cpp (master)> python3 convert-llama-ggmlv3-to-gguf.py -i ~/vicuna-7b-v1.5.ggmlv3.q4_0.bin -o ~/vicuna-7b-v1.5.ggmlv3.q4_0.gguf --eps 1e-5 -c 4096
Traceback (most recent call last):
  File "/data/data/com.termux/files/home/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 4, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'

It's not a big deal 'cuz I don't have a PC so it's probably too intensive for my little Android to handle.

I'll be patient ❤️

@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Aug 22, 2023

@JackJollimore did you try pkg install python-numpy? Or did you use pip? What does which python3 show?

@ghost
Copy link

ghost commented Aug 22, 2023

did you try pkg install python-numpy? Or did you use pip? What does which python3 show?

Ah, python-numpy does the trick! It started converting. Good call.

I'm downloading updated llama.cpp and I'll try again shortly to see if I can actually finish a conversion.

Edit: Successful completion. Output saved to: /data/data/com.termux/files/home/vicuna-7b-v1.5.ggmlv3.q4_0.gguf

It converted! I didn't know my device could do that, thanks for the suggestion.

@ghost
Copy link

ghost commented Aug 28, 2023

Recently. I'm getting an error on latest build main: build = 1096 (103cfaf)

I want to convert ggmlv3 to gguf, here's my attempt: python3 convert-llama-ggmlv3-to-gguf.py -i ~/Pygmalion-Vicuna-1.1-7b.ggmlv3.Q4_0.bin -o ~/Pygmalion-Vicuna-1.1-7b.Q4_0.gguf -m ~/storage/downloads/Pyg-Vic

Error:

Traceback (most recent call last):                           
File "/data/data/com.termux/files/home/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 7, in <module>                  
import gguf                                            
ModuleNotFoundError: No module named 'gguf'

Screenshot_20230828_022950
llama.cpp has a gguf executable as shown in the image. The error occurs even wthout arguments after `./convert-llama-ggmlv3-to-gguf.py.

@cebtenzzre
Copy link
Collaborator

@JackJollimore You need to install the gguf package for python. You can cd gguf-py && pip install -e . to install it from the repo, or pip install gguf to install it from the internet.

I'm not sure if this is documented anywhere.

@ghost
Copy link

ghost commented Aug 28, 2023

or pip install gguf to install it from the internet.

Right again @cebtenzzre, pip install gguf, allowed me to convert.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants