-
Hello, Is there any particular reason why LLAMA_MAX_SEQ is set to 64 and, say, 128 would not work or degrade in a nonlinear way? I am thinking about a specific scenario where I would like to max out a single 80 GB GPU with as many slots as possible and with quantized model and cache around 96 slots could be possible but it's currently not due to LLAMA_MAX_SEQ beeing 64. Thanks lot! Cheers |
Beta Was this translation helpful? Give feedback.
Answered by
ggerganov
Sep 5, 2025
Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
adhusch
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
See 4f81b33#commitcomment-159972791