-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Labels
backThis issue is related to the Svelte backend or the DBThis issue is related to the Svelte backend or the DBquestionFurther information is requestedFurther information is requested
Description
Hi,
I am running a 4096 context length model behind TGI interface. My primary use case is summarization wherein some of my requests can be quite large.
I have set truncate
to 4000 and that leaves max_new_tokens
to be at most 4096-4000=96.
So, even if my input length is not 4000 tokens long, say it is only 1024 tokens long, I can only generate 96 token long response. In this case, max_new_tokens
can be 4096-1024=3072.
Is it possible for chat-ui
to dynamically adjust the max_new_tokens
this way?
Thanks for the great work!
Metadata
Metadata
Assignees
Labels
backThis issue is related to the Svelte backend or the DBThis issue is related to the Svelte backend or the DBquestionFurther information is requestedFurther information is requested