Skip to content

Dynamically adjust max_new_tokens #397

@abhinavkulkarni

Description

@abhinavkulkarni

Hi,

I am running a 4096 context length model behind TGI interface. My primary use case is summarization wherein some of my requests can be quite large.

I have set truncate to 4000 and that leaves max_new_tokens to be at most 4096-4000=96.

So, even if my input length is not 4000 tokens long, say it is only 1024 tokens long, I can only generate 96 token long response. In this case, max_new_tokens can be 4096-1024=3072.

Is it possible for chat-ui to dynamically adjust the max_new_tokens this way?

Thanks for the great work!

Metadata

Metadata

Assignees

Labels

backThis issue is related to the Svelte backend or the DBquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions