conversation template should come from huggingface tokenizer instead of fastchat

vllm/entrypoints/openai/api_server.py

fastchat is a hack.
The real definition of conversation template is in the huggingface tokenizer.

https://huggingface.co/docs/transformers/main/chat_templating

Please use the huggingface tokenizer template instead of fastchat

```
>> from transformers import AutoTokenizer
>> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

>> chat = [
  {"role": "user", "content": "Hello, how are you?"},
  {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
  {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

>> tokenizer.use_default_system_prompt = False
>> tokenizer.apply_chat_template(chat, tokenize=False)
"<s>[INST] Hello, how are you? [/INST] I'm doing great. How can I help you today? </s><s>[INST] I'd like to show off how chat templating works! [/INST]"

```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

conversation template should come from huggingface tokenizer instead of fastchat #1361

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

conversation template should come from huggingface tokenizer instead of fastchat #1361

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions