Skip to content

[User] GGUF conversion, stop sequence Problem #2711

@ghost

Description

Hi <3 llama.cpp

@KerfuffleV2 shows us that models converted without metadata load different: Loading non-metadata:

llama_model_load_internal: BOS token = 1 ' '
llama_model_load_internal: EOS token = 2 ' '

Loading with one converted with external metadata:

llama_model_load_internal: BOS token = 1 '<s>'
llama_model_load_internal: EOS token = 2 '</s>'

I converted WizardMath-7B-V1.0 to GGUF and here's a couple runs:
ex1:

~/l/b/bin (master) [SIGINT]> ./main -m ~/wizardmath-7b-v1.0.ggmlv3.q4_0.gguf --color -c 2048 --keep -1 -n -1 -t 3 -b 7 -i -r "User:" --in-prefix " " --in-suffix "Assistant:" -f ~/storage/shared/PT/M.txt

main: build = 1015 (226255b)
main: seed  = 1692706079
llama_model_loader: loaded meta data with 15 key-value pairs and 291 tensors from /data/data/com.termux/files/home/wizardmath-7b-v1.0.ggmlv3.q4_0.gguf (version GGUF V1 (latest))
..
llama_model_load_internal: format       = GGUF V1 (latest) llama_model_load_internal: arch         = llama
llama_model_load_internal: vocab type   = SPM              llama_model_load_internal: n_vocab      = 32001
llama_model_load_internal: n_ctx_train  = 2048             llama_model_load_internal: n_ctx        = 2048
llama_model_load_internal: n_embd       = 4096             llama_model_load_internal: n_head       = 32
llama_model_load_internal: n_head_kv    = 32               llama_model_load_internal: n_layer      = 32
llama_model_load_internal: n_rot        = 128              llama_model_load_internal: n_gqa        = 1
llama_model_load_internal: f_norm_eps   = 5.0e-06          llama_model_load_internal: n_ff         = 11008
llama_model_load_internal: freq_base    = 10000.0          llama_model_load_internal: freq_scale   = 1
llama_model_load_internal: model type   = 7B               llama_model_load_internal: model ftype  = mostly Q4_0
llama_model_load_internal: model size   = 6.74 B           llama_model_load_internal: general.name = wizardmath-7b-v1.0.ggmlv3.q4_0.bin                        
llama_model_load_internal: BOS token = 1 ''
llama_model_load_internal: EOS token = 2 ''                llama_model_load_internal: LF token  = 13 '<0x0A>'
llama_model_load_internal: ggml ctx size =    0.09 MB      llama_model_load_internal: mem required  = 3615.73 MB (+ 1024.00 MB per state)                                       
llama_new_context_with_model: kv self size  = 1024.00 MB
llama_new_context_with_model: compute buffer total size =    3.49 MB
                                                           
system_info: n_threads = 3 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |                          

main: interactive mode on.
Reverse prompt: 'User'                                     
Input prefix: ' '
Input suffix: 'Assistant:'                                 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 7, n_predict = -1, n_keep = 42
                                                           
== Running in interactive mode. ==                          
- Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.                 
 - To return control without starting a new line, end your input with '/'.                                             
- If you want to submit another line, end your input with '\'.                                                       

Below is an instruction that describes a task. Write a response that appropriately completes the request.
                                                           
### Instruction:
Please list 3 movie titles. 
                               
### Response: We are asked to list 3 movie titles, which means we need to come up with at least 3 different movie titles. Here is a list of 3 movie titles for your reference:   
1. Titanic (1997)                                        
2. The Matrix (1999)
3. Toy Story (1995)                                        

These are just some examples, and there are certainly many more movie titles out there. However, these 3 movies have been well-known and popular for a long time, and they represent different genres and styles of filmmaking. Therefore, I believe that these 3 movie titles will not disappoint you.
The answer is: Here are three movie titles: Titanic (1997), The Matrix (1999), and Toy Story (1995).
                                                           
</s>
                                                         
The answer is: Three movie titles are: Titanic (1997), The Matrix (1999), and Toy Story (1995)..                      
</s>

ex2:

### Instruction:
Please list 3 movie titles.

### Response:I'm not sure what you're looking for, but here are some movie titles:

1. The Shawshank Redemption
2. Schindler's List
3. The Godfather

The answer is: Here are three movie titles:
1. The Shawshank Redemption
2. Schindler's List
3. The Godfather.

</s>

The answer is: Here are three movie titles:
1. The Shawshank Redemption
2. Schindler's List
3. The Godfather.

</s>

It appears due to the way the model is converted it's unable to utilise the stop sequence, thus doesn't return control to the User in this case.

Edit: Error message trying to include metadata:

python3 convert-llama-ggmlv3-to-gguf.py -i ~/wizardmath-7b-v1.0.ggmlv3.q4_0.bin -o ~/wizardM2.gguf -c 2048 -m ~/storage/shared/downloads/wizardmath             
                                  
* Using config: Namespace(input=PosixPath('/data/data/com.termux/files/home/wizardmath-7b-v1.0.ggmlv3.q4_0.bin'), output=PosixPath('/data/data/com.termux/files/home/wizardM2.gguf'), name=None, desc=None, gqa=1, eps='5.0e-06', context_length=2048, model_metadata_dir=PosixPath('/data/data/com.termux/files/home/storage/shared/downloads/wizardmath'), vocab_dir=None, vocabtype='spm')
                                                          
 === WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===
                                                          
* Scanning GGML input file
* GGML model hyperparameters: <Hyperparameters: n_vocab=32001, n_embd=4096, n_mult=5504, n_head=32, n_layer=32, n_rot=128, n_ff=11008, ftype=2>   

Traceback (most recent call last): File "/data/data/com.termux/files/home/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 333, in <module>
    main()                                  
    
File "/data/data/com.termux/files/home/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 323, in main(params_override, vocab_override) = handle_metadata(cfg, model.hyperparameters)                                                     
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^         

File "/data/data/com.termux/files/home/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 274, in handle_metadata import convert File "/data/data/com.termux/files/home/llama.cpp/convert.py", line 27, in <module> from sentencepiece import SentencePieceProcessor  # type: ignore 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'sentencepiece'

Repo & here's the content of ~/storage/shared/downloads/wizardmath:
Screenshot_20230822_100229

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions