Why max 64 parallel requests? #16069

Nico3012 · 2025-09-18T08:39:03Z

Nico3012
Sep 18, 2025

Hey,

why does llama.cpp only allow 64 parallel requests?
If i set --parallel to e.g. 256, i get the error:

"llama_init_from_model: failed to initialize the context: n_seq_max must be <= 64"

What is the reason for this limit?