[User] GGUF conversion, stop sequence Problem

Hi <3 llama.cpp


@KerfuffleV2 [shows us](https://github.com/ggerganov/llama.cpp/pull/2682#issuecomment-1685885492) that models converted without `metadata` load different: Loading non-metadata:
```
llama_model_load_internal: BOS token = 1 ' '
llama_model_load_internal: EOS token = 2 ' '
```
Loading with one converted with external metadata:
```
llama_model_load_internal: BOS token = 1 '<s>'
llama_model_load_internal: EOS token = 2 '</s>'
```

I converted WizardMath-7B-V1.0 to GGUF and here's a couple runs:
ex1:
```
~/l/b/bin (master) [SIGINT]> ./main -m ~/wizardmath-7b-v1.0.ggmlv3.q4_0.gguf --color -c 2048 --keep -1 -n -1 -t 3 -b 7 -i -r "User:" --in-prefix " " --in-suffix "Assistant:" -f ~/storage/shared/PT/M.txt

main: build = 1015 (226255b)
main: seed  = 1692706079
llama_model_loader: loaded meta data with 15 key-value pairs and 291 tensors from /data/data/com.termux/files/home/wizardmath-7b-v1.0.ggmlv3.q4_0.gguf (version GGUF V1 (latest))
..
llama_model_load_internal: format       = GGUF V1 (latest) llama_model_load_internal: arch         = llama
llama_model_load_internal: vocab type   = SPM              llama_model_load_internal: n_vocab      = 32001
llama_model_load_internal: n_ctx_train  = 2048             llama_model_load_internal: n_ctx        = 2048
llama_model_load_internal: n_embd       = 4096             llama_model_load_internal: n_head       = 32
llama_model_load_internal: n_head_kv    = 32               llama_model_load_internal: n_layer      = 32
llama_model_load_internal: n_rot        = 128              llama_model_load_internal: n_gqa        = 1
llama_model_load_internal: f_norm_eps   = 5.0e-06          llama_model_load_internal: n_ff         = 11008
llama_model_load_internal: freq_base    = 10000.0          llama_model_load_internal: freq_scale   = 1
llama_model_load_internal: model type   = 7B               llama_model_load_internal: model ftype  = mostly Q4_0
llama_model_load_internal: model size   = 6.74 B           llama_model_load_internal: general.name = wizardmath-7b-v1.0.ggmlv3.q4_0.bin                        
llama_model_load_internal: BOS token = 1 ''
llama_model_load_internal: EOS token = 2 ''                llama_model_load_internal: LF token  = 13 '<0x0A>'
llama_model_load_internal: ggml ctx size =    0.09 MB      llama_model_load_internal: mem required  = 3615.73 MB (+ 1024.00 MB per state)                                       
llama_new_context_with_model: kv self size  = 1024.00 MB
llama_new_context_with_model: compute buffer total size =    3.49 MB
                                                           
system_info: n_threads = 3 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |                          

main: interactive mode on.
Reverse prompt: 'User'                                     
Input prefix: ' '
Input suffix: 'Assistant:'                                 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 7, n_predict = -1, n_keep = 42
                                                           
== Running in interactive mode. ==                          
- Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.                 
 - To return control without starting a new line, end your input with '/'.                                             
- If you want to submit another line, end your input with '\'.                                                       

Below is an instruction that describes a task. Write a response that appropriately completes the request.
                                                           
### Instruction:
Please list 3 movie titles. 
                               
### Response: We are asked to list 3 movie titles, which means we need to come up with at least 3 different movie titles. Here is a list of 3 movie titles for your reference:   
1. Titanic (1997)                                        
2. The Matrix (1999)
3. Toy Story (1995)                                        

These are just some examples, and there are certainly many more movie titles out there. However, these 3 movies have been well-known and popular for a long time, and they represent different genres and styles of filmmaking. Therefore, I believe that these 3 movie titles will not disappoint you.
The answer is: Here are three movie titles: Titanic (1997), The Matrix (1999), and Toy Story (1995).
                                                           
</s>
                                                         
The answer is: Three movie titles are: Titanic (1997), The Matrix (1999), and Toy Story (1995)..                      
</s>

```

ex2:
```
### Instruction:
Please list 3 movie titles.

### Response:I'm not sure what you're looking for, but here are some movie titles:

1. The Shawshank Redemption
2. Schindler's List
3. The Godfather

The answer is: Here are three movie titles:
1. The Shawshank Redemption
2. Schindler's List
3. The Godfather.

</s>

The answer is: Here are three movie titles:
1. The Shawshank Redemption
2. Schindler's List
3. The Godfather.

</s>
```

It appears due to the way the model is converted it's unable to utilise the stop sequence, thus doesn't return control to the `User` in this case.


Edit: Error message trying to include metadata:
```
python3 convert-llama-ggmlv3-to-gguf.py -i ~/wizardmath-7b-v1.0.ggmlv3.q4_0.bin -o ~/wizardM2.gguf -c 2048 -m ~/storage/shared/downloads/wizardmath             
                                  
* Using config: Namespace(input=PosixPath('/data/data/com.termux/files/home/wizardmath-7b-v1.0.ggmlv3.q4_0.bin'), output=PosixPath('/data/data/com.termux/files/home/wizardM2.gguf'), name=None, desc=None, gqa=1, eps='5.0e-06', context_length=2048, model_metadata_dir=PosixPath('/data/data/com.termux/files/home/storage/shared/downloads/wizardmath'), vocab_dir=None, vocabtype='spm')
                                                          
 === WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===
                                                          
* Scanning GGML input file
* GGML model hyperparameters: <Hyperparameters: n_vocab=32001, n_embd=4096, n_mult=5504, n_head=32, n_layer=32, n_rot=128, n_ff=11008, ftype=2>   

Traceback (most recent call last): File "/data/data/com.termux/files/home/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 333, in <module>
    main()                                  
    
File "/data/data/com.termux/files/home/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 323, in main(params_override, vocab_override) = handle_metadata(cfg, model.hyperparameters)                                                     
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^         

File "/data/data/com.termux/files/home/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 274, in handle_metadata import convert File "/data/data/com.termux/files/home/llama.cpp/convert.py", line 27, in <module> from sentencepiece import SentencePieceProcessor  # type: ignore 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'sentencepiece'
```

[Repo](https://huggingface.co/WizardLM/WizardMath-7B-V1.0/tree/main) & here's the content of `~/storage/shared/downloads/wizardmath`:
![Screenshot_20230822_100229](https://github.com/ggerganov/llama.cpp/assets/130917767/a78726c6-4a6c-450e-a2ed-57eb277adc8b)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[User] GGUF conversion, stop sequence Problem #2711

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[User] GGUF conversion, stop sequence Problem #2711

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions