add phi3 support #6852

liuwei-git · 2024-04-23T17:51:19Z

Make phi3 as an explicit model to support in llama.

ggerganov · 2024-04-23T18:09:16Z

Might have to add <|end|> as an EOT token:

diff --git a/llama.cpp b/llama.cpp
index 63483b9a..698ad236 100644
--- a/llama.cpp
+++ b/llama.cpp
@@ -4381,6 +4381,7 @@ static void llm_load_vocab(
                         //vocab.id_to_token[t.second].type == LLAMA_TOKEN_TYPE_CONTROL &&
                         (t.first == "<|eot_id|>" ||
                          t.first == "<|im_end|>" ||
+                         t.first == "<|end|>" ||
                          t.first == "<end_of_turn>"
                         )
                    ) {

This seems to be producing better results than #6851
For example, I don't see the <|calc|> and <|/data|> tokens generated. I wonder where the difference comes from?

ref: #6852

ggerganov · 2024-04-23T19:31:16Z

So the difference was in the tokenization - in the other PR the <|system|> token was being incorrectly mapped to 32034, while it should have been 32006. I applied the changes from this PR into _set_vocab_sentencepiece() since this implementation seems to be correct: 5dcccb3

I wonder if it affects the conversion of some other models too?

Anyway, now the results match except for the other PR having a BOS token added at the start, while this PR does not:

https://github.com/ggerganov/llama.cpp/pull/6852/files#diff-ecca4c14f9a354b5557247cafd79409b332bfa1e9c12594f83282af1fde4743eR2064

Just double-checking if this is the intent?

There is also a minor issue because of this - the tokenizer.ggml.add_bos_token KV is written 2 times in the header: one time it is true and another it is false:

python3 gguf-py/scripts/gguf-dump.py models/phi-3-4k-instruct/ggml-model-f16.gguf
* Loading: models/phi-3-4k-instruct/ggml-model-f16-new.gguf
Traceback (most recent call last):
KeyError: 'Duplicate tokenizer.ggml.add_bos_token already in list at offset 725511'

This is because we already write this field automatically here:

https://github.com/ggerganov/llama.cpp/blob/171a73890ec0948293c675a8ab1779e01aac906f/gguf-py/gguf/vocab.py#L61-L71

bartowski1182 · 2024-04-23T21:33:28Z

So is the implementation in #6851 preferred or are both needed for official support?

ggerganov

Thank you for the nice implementation.

I decided to set the "add BOS" KV to True based on this configuration:

https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/3a811845d89f3c1b3f41b341d0f9f05104769f35/tokenizer_config.json#L2

x4080 · 2024-04-25T21:26:50Z

hi, using server "<|eot_id|>" still printed at the end of conversation, and I can't find stop token now in /examples/server/utils.hpp, how to avoid this "<|eot_id|>" in server ?

thanks

ggerganov · 2024-04-26T07:27:52Z

Most likely you are using base model instead of instruct model. See #6916 for clear explanation and way to add stop tokens from client-side

x4080 · 2024-04-26T22:35:33Z

@ggerganov Hi, no i was using Phi-3-mini-128k-instruct.Q4_K_M.gguf, forget it, I think this was for server, for non server it already works fine

liuwei-git added 4 commits April 24, 2024 00:55

add explicit phi3 support

e693add

add explicit phi3 support

9ff9562

Merge branch 'master' of https://github.com/liuwei-git/llama.cpp

725afbc

remove unused code

171a738

ggerganov added a commit that referenced this pull request Apr 23, 2024

convert : fix tokenizer conversion

5dcccb3

ref: #6852

ggerganov mentioned this pull request Apr 23, 2024

convert : add phi-3 support #6851

Closed

tristandruyen mentioned this pull request Apr 23, 2024

Add phi 3 chat template #6857

Merged

ggerganov added 3 commits April 24, 2024 09:38

convert : add BOS token

cef12f9

llama : match EOT token <|end|>

1bf93ce

llama : minor / style

32661ac

ggerganov approved these changes Apr 24, 2024

View reviewed changes

ggerganov added 2 commits April 24, 2024 09:43

llama : tabs -> spaces

ae133e7

convert : fix lint checks

725cf63

ggerganov merged commit c8297c6 into ggml-org:master Apr 24, 2024

ggerganov mentioned this pull request Apr 24, 2024

convert : fix set_vocab_sentencepiece #6866

Merged

hak8or mentioned this pull request Apr 25, 2024

I have tested 4-5 phi-3-128K-Instruct models from different providers with different quants, all GGUF files, none are runnable with ollama ollama/ollama#3894

Open

Galunid mentioned this pull request Jun 4, 2024

Bug: Phi-3 4K output broken after 2000~ tokens (Reproducible) #7709

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add phi3 support #6852

add phi3 support #6852

Uh oh!

liuwei-git commented Apr 23, 2024

Uh oh!

ggerganov commented Apr 23, 2024

Uh oh!

ggerganov commented Apr 23, 2024

Uh oh!

bartowski1182 commented Apr 23, 2024

Uh oh!

ggerganov left a comment

Uh oh!

x4080 commented Apr 25, 2024

Uh oh!

ggerganov commented Apr 26, 2024

Uh oh!

x4080 commented Apr 26, 2024 •

edited

Loading

Uh oh!

Uh oh!

add phi3 support #6852

add phi3 support #6852

Uh oh!

Conversation

liuwei-git commented Apr 23, 2024

Uh oh!

ggerganov commented Apr 23, 2024

Uh oh!

ggerganov commented Apr 23, 2024

Uh oh!

bartowski1182 commented Apr 23, 2024

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

x4080 commented Apr 25, 2024

Uh oh!

ggerganov commented Apr 26, 2024

Uh oh!

x4080 commented Apr 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

x4080 commented Apr 26, 2024 •

edited

Loading