-
Notifications
You must be signed in to change notification settings - Fork 11.9k
add phi3 support #6852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add phi3 support #6852
Conversation
Might have to add diff --git a/llama.cpp b/llama.cpp
index 63483b9a..698ad236 100644
--- a/llama.cpp
+++ b/llama.cpp
@@ -4381,6 +4381,7 @@ static void llm_load_vocab(
//vocab.id_to_token[t.second].type == LLAMA_TOKEN_TYPE_CONTROL &&
(t.first == "<|eot_id|>" ||
t.first == "<|im_end|>" ||
+ t.first == "<|end|>" ||
t.first == "<end_of_turn>"
)
) { This seems to be producing better results than #6851 |
So the difference was in the tokenization - in the other PR the I wonder if it affects the conversion of some other models too? Anyway, now the results match except for the other PR having a BOS token added at the start, while this PR does not: Just double-checking if this is the intent? There is also a minor issue because of this - the python3 gguf-py/scripts/gguf-dump.py models/phi-3-4k-instruct/ggml-model-f16.gguf
* Loading: models/phi-3-4k-instruct/ggml-model-f16-new.gguf
Traceback (most recent call last):
KeyError: 'Duplicate tokenizer.ggml.add_bos_token already in list at offset 725511' This is because we already write this field automatically here: |
So is the implementation in #6851 preferred or are both needed for official support? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the nice implementation.
I decided to set the "add BOS" KV to True
based on this configuration:
hi, using server "<|eot_id|>" still printed at the end of conversation, and I can't find stop token now in /examples/server/utils.hpp, how to avoid this "<|eot_id|>" in server ? thanks |
Most likely you are using base model instead of instruct model. See #6916 for clear explanation and way to add stop tokens from client-side |
@ggerganov Hi, no i was using Phi-3-mini-128k-instruct.Q4_K_M.gguf, forget it, I think this was for server, for non server it already works fine |
Make phi3 as an explicit model to support in llama.