Reason for LLAMA_MAX_SEQ=64 ? #15819

adhusch · 2025-09-05T14:56:34Z

adhusch
Sep 5, 2025

Hello,

Is there any particular reason why LLAMA_MAX_SEQ is set to 64 and, say, 128 would not work or degrade in a nonlinear way? I am thinking about a specific scenario where I would like to max out a single 80 GB GPU with as many slots as possible and with quantized model and cache around 96 slots could be possible but it's currently not due to LLAMA_MAX_SEQ beeing 64.

Thanks lot!

Cheers

Answered by ggerganov

Sep 5, 2025

See 4f81b33#commitcomment-159972791

View full answer

ggerganov · 2025-09-05T15:16:07Z

ggerganov
Sep 5, 2025
Maintainer

See 4f81b33#commitcomment-159972791

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reason for LLAMA_MAX_SEQ=64 ? #15819

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Reason for LLAMA_MAX_SEQ=64 ? #15819

Uh oh!

Uh oh!

adhusch Sep 5, 2025

Replies: 1 comment

Uh oh!

ggerganov Sep 5, 2025 Maintainer

adhusch
Sep 5, 2025

ggerganov
Sep 5, 2025
Maintainer