Skip to content

Bug--Llama-2-70b-chat-hf error: truncate must be strictly positive and less than 1024. Given: 3072 #899

@majestichou

Description

@majestichou

I use the docker image chat-ui-db as the frontend, text-generation-inference as the inference backend, and meta-llamaLlama-2-70b-chat-hf as the model.
In the model field of the .env.local file, I have the following settings

MODELS=`[
     {
      "name": "meta-llama/Llama-2-70b-chat-hf",
      "endpoints": [{
        "type" : "tgi",
        "url": "http://textgen:80",
        }],
      "preprompt": " ",
      "chatPromptTemplate" : "<s>[INST] <<SYS>>\n{{preprompt}}\n<</SYS>>\n\n{{#each messages}}{{#ifUser}}{{content}} [/INST] {{/ifUser}}{{#ifAssistant}}{{content}} </s><s>[INST] {{/ifAssistant}}{{/each}}",
      "promptExamples": [
        {
          "title": "Write an email from bullet list",
          "prompt": "As a restaurant owner, write a professional email to the supplier to get these products every week: \n\n- Wine (x10)\n- Eggs (x24)\n- Bread (x12)"
        }, {
          "title": "Code a snake game",
          "prompt": "Code a basic snake game in python, give explanations for each step."
        }, {
          "title": "Assist in a task",
          "prompt": "How do I make a delicious lemon cheesecake?"
        }
      ],
      "parameters": {
        "temperature": 0.1,
        "top_p": 0.95,
        "repetition_penalty": 1.2,
        "top_k": 50,
        "truncate": 3072,
        "max_new_tokens": 1024,
        "stop" : ["</s>", "</s><s>[INST]"]
      }
    }
]`

This setting is the same as the setting for Llama-2-70b-chat-hf in the .env.template file in the chat-ui repository.
Then I type the question in the input box. An error has occurred.
The following error information is found in the log:

textgen  | 2024-03-05T20:00:38.883413Z ERROR compat_generate{default_return_full_text=false compute_type=Extension(ComputeType("8-nvidia-a100-sxm4-40gb"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.1), repetition_penalty: Some(1.2), frequency_penalty: None, top_k: Some(50), top_p: Some(0.95), typical_p: None, do_sample: false, max_new_tokens: Some(1024), return_full_text: Some(false), stop: ["</s>", "</s><s>[INST]"], truncate: Some(3072), watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None }}:async_stream:generate_stream: text_generation_router::infer: router/src/infer.rs:123: `truncate` must be strictly positive and less than 1024. Given: 3072
chat-ui  | Error: Input validation error: `truncate` must be strictly positive and less than 1024. Given: 3072
chat-ui  |     at streamingRequest (file:///app/node_modules/@huggingface/inference/dist/index.mjs:323:19)
chat-ui  |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
chat-ui  |     at async textGenerationStream (file:///app/node_modules/@huggingface/inference/dist/index.mjs:673:3)
chat-ui  |     at async generateFromDefaultEndpoint (file:///app/.svelte-kit/output/server/entries/endpoints/conversation/_id_/_server.ts.js:39:20)
chat-ui  |     at async summarize (file:///app/.svelte-kit/output/server/entries/endpoints/conversation/_id_/_server.ts.js:287:10)
chat-ui  |     at async file:///app/.svelte-kit/output/server/entries/endpoints/conversation/_id_/_server.ts.js:607:26
textgen  | 2024-03-05T20:00:38.910266Z ERROR compat_generate{default_return_full_text=false compute_type=Extension(ComputeType("8-nvidia-a100-sxm4-40gb"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.1), repetition_penalty: Some(1.2), frequency_penalty: None, top_k: Some(50), top_p: Some(0.95), typical_p: None, do_sample: false, max_new_tokens: Some(1024), return_full_text: Some(false), stop: ["</s>", "</s><s>[INST]"], truncate: Some(3072), watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None }}:async_stream:generate_stream: text_generation_router::infer: router/src/infer.rs:123: `truncate` must be strictly positive and less than 1024. Given: 3072

I set "truncate" to 1000, everything is ok.
"truncate" for Llama-2-70b-chat-hf in the .env.template file in the chat-ui repository is 3072. I think the 3072 should work fine. I don't know how webpage https://huggingface.co/chat/ sets this parameter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    modelsThis issue is related to model performance/reliabilitysupportA request for help setting things up

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions