-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Labels
modelsThis issue is related to model performance/reliabilityThis issue is related to model performance/reliabilitysupportA request for help setting things upA request for help setting things up
Description
I use the docker image chat-ui-db as the frontend, text-generation-inference as the inference backend, and meta-llamaLlama-2-70b-chat-hf as the model.
In the model field of the .env.local file, I have the following settings
MODELS=`[
{
"name": "meta-llama/Llama-2-70b-chat-hf",
"endpoints": [{
"type" : "tgi",
"url": "http://textgen:80",
}],
"preprompt": " ",
"chatPromptTemplate" : "<s>[INST] <<SYS>>\n{{preprompt}}\n<</SYS>>\n\n{{#each messages}}{{#ifUser}}{{content}} [/INST] {{/ifUser}}{{#ifAssistant}}{{content}} </s><s>[INST] {{/ifAssistant}}{{/each}}",
"promptExamples": [
{
"title": "Write an email from bullet list",
"prompt": "As a restaurant owner, write a professional email to the supplier to get these products every week: \n\n- Wine (x10)\n- Eggs (x24)\n- Bread (x12)"
}, {
"title": "Code a snake game",
"prompt": "Code a basic snake game in python, give explanations for each step."
}, {
"title": "Assist in a task",
"prompt": "How do I make a delicious lemon cheesecake?"
}
],
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 3072,
"max_new_tokens": 1024,
"stop" : ["</s>", "</s><s>[INST]"]
}
}
]`
This setting is the same as the setting for Llama-2-70b-chat-hf in the .env.template file in the chat-ui repository.
Then I type the question in the input box. An error has occurred.
The following error information is found in the log:
textgen | 2024-03-05T20:00:38.883413Z ERROR compat_generate{default_return_full_text=false compute_type=Extension(ComputeType("8-nvidia-a100-sxm4-40gb"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.1), repetition_penalty: Some(1.2), frequency_penalty: None, top_k: Some(50), top_p: Some(0.95), typical_p: None, do_sample: false, max_new_tokens: Some(1024), return_full_text: Some(false), stop: ["</s>", "</s><s>[INST]"], truncate: Some(3072), watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None }}:async_stream:generate_stream: text_generation_router::infer: router/src/infer.rs:123: `truncate` must be strictly positive and less than 1024. Given: 3072
chat-ui | Error: Input validation error: `truncate` must be strictly positive and less than 1024. Given: 3072
chat-ui | at streamingRequest (file:///app/node_modules/@huggingface/inference/dist/index.mjs:323:19)
chat-ui | at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
chat-ui | at async textGenerationStream (file:///app/node_modules/@huggingface/inference/dist/index.mjs:673:3)
chat-ui | at async generateFromDefaultEndpoint (file:///app/.svelte-kit/output/server/entries/endpoints/conversation/_id_/_server.ts.js:39:20)
chat-ui | at async summarize (file:///app/.svelte-kit/output/server/entries/endpoints/conversation/_id_/_server.ts.js:287:10)
chat-ui | at async file:///app/.svelte-kit/output/server/entries/endpoints/conversation/_id_/_server.ts.js:607:26
textgen | 2024-03-05T20:00:38.910266Z ERROR compat_generate{default_return_full_text=false compute_type=Extension(ComputeType("8-nvidia-a100-sxm4-40gb"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.1), repetition_penalty: Some(1.2), frequency_penalty: None, top_k: Some(50), top_p: Some(0.95), typical_p: None, do_sample: false, max_new_tokens: Some(1024), return_full_text: Some(false), stop: ["</s>", "</s><s>[INST]"], truncate: Some(3072), watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None }}:async_stream:generate_stream: text_generation_router::infer: router/src/infer.rs:123: `truncate` must be strictly positive and less than 1024. Given: 3072
I set "truncate" to 1000, everything is ok.
"truncate" for Llama-2-70b-chat-hf in the .env.template file in the chat-ui repository is 3072. I think the 3072 should work fine. I don't know how webpage https://huggingface.co/chat/ sets this parameter.
Metadata
Metadata
Assignees
Labels
modelsThis issue is related to model performance/reliabilityThis issue is related to model performance/reliabilitysupportA request for help setting things upA request for help setting things up