-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Closed
Closed
Copy link
Description
vllm/entrypoints/openai/api_server.py
fastchat is a hack.
The real definition of conversation template is in the huggingface tokenizer.
https://huggingface.co/docs/transformers/main/chat_templating
Please use the huggingface tokenizer template instead of fastchat
>> from transformers import AutoTokenizer
>> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
>> chat = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "I'd like to show off how chat templating works!"},
]
>> tokenizer.use_default_system_prompt = False
>> tokenizer.apply_chat_template(chat, tokenize=False)
"<s>[INST] Hello, how are you? [/INST] I'm doing great. How can I help you today? </s><s>[INST] I'd like to show off how chat templating works! [/INST]"
adamlin120, alimoezzi, BillChan226 and shoaibahmed
Metadata
Metadata
Assignees
Labels
No labels