Skip to content

add phi3 support #6852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 24, 2024
Merged

add phi3 support #6852

merged 9 commits into from
Apr 24, 2024

Conversation

liuwei-git
Copy link
Contributor

Make phi3 as an explicit model to support in llama.

@ggerganov
Copy link
Member

Might have to add <|end|> as an EOT token:

diff --git a/llama.cpp b/llama.cpp
index 63483b9a..698ad236 100644
--- a/llama.cpp
+++ b/llama.cpp
@@ -4381,6 +4381,7 @@ static void llm_load_vocab(
                         //vocab.id_to_token[t.second].type == LLAMA_TOKEN_TYPE_CONTROL &&
                         (t.first == "<|eot_id|>" ||
                          t.first == "<|im_end|>" ||
+                         t.first == "<|end|>" ||
                          t.first == "<end_of_turn>"
                         )
                    ) {

This seems to be producing better results than #6851
For example, I don't see the <|calc|> and <|/data|> tokens generated. I wonder where the difference comes from?

ggerganov added a commit that referenced this pull request Apr 23, 2024
@ggerganov
Copy link
Member

So the difference was in the tokenization - in the other PR the <|system|> token was being incorrectly mapped to 32034, while it should have been 32006. I applied the changes from this PR into _set_vocab_sentencepiece() since this implementation seems to be correct: 5dcccb3

I wonder if it affects the conversion of some other models too?

Anyway, now the results match except for the other PR having a BOS token added at the start, while this PR does not:

https://github.com/ggerganov/llama.cpp/pull/6852/files#diff-ecca4c14f9a354b5557247cafd79409b332bfa1e9c12594f83282af1fde4743eR2064

Just double-checking if this is the intent?

There is also a minor issue because of this - the tokenizer.ggml.add_bos_token KV is written 2 times in the header: one time it is true and another it is false:

python3 gguf-py/scripts/gguf-dump.py models/phi-3-4k-instruct/ggml-model-f16.gguf
* Loading: models/phi-3-4k-instruct/ggml-model-f16-new.gguf
Traceback (most recent call last):
KeyError: 'Duplicate tokenizer.ggml.add_bos_token already in list at offset 725511'

This is because we already write this field automatically here:

https://github.com/ggerganov/llama.cpp/blob/171a73890ec0948293c675a8ab1779e01aac906f/gguf-py/gguf/vocab.py#L61-L71

@bartowski1182
Copy link
Contributor

So is the implementation in #6851 preferred or are both needed for official support?

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the nice implementation.

I decided to set the "add BOS" KV to True based on this configuration:

https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/3a811845d89f3c1b3f41b341d0f9f05104769f35/tokenizer_config.json#L2

@x4080
Copy link

x4080 commented Apr 25, 2024

hi, using server "<|eot_id|>" still printed at the end of conversation, and I can't find stop token now in /examples/server/utils.hpp, how to avoid this "<|eot_id|>" in server ?

thanks

@ggerganov
Copy link
Member

Most likely you are using base model instead of instruct model. See #6916 for clear explanation and way to add stop tokens from client-side

@x4080
Copy link

x4080 commented Apr 26, 2024

@ggerganov Hi, no i was using Phi-3-mini-128k-instruct.Q4_K_M.gguf, forget it, I think this was for server, for non server it already works fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants