Skip to content

[Feature] Add support to concurrently calculate prefills of multiple requests #8764

@kelliaao

Description

@kelliaao

Checklist

Motivation

We have a problem where high volumes of small-prompt requests are usually processed smoothly, but quickly pile up in a giant queue when a small number of large-prompt requests are submitted.We attempted to use the --enable-mixed-chunk parameter to enable sglang to handle both prefill and decode concurrently. However, sglang only processes a single sequence during prefill, causing short-text requests to remain blocked by long-text requests.
We also observed that vLLM allows concurrent processing of multiple requests in the prefill phase through the --max-num-partial-prefills parameter. Nonetheless, sglang demonstrably outperforms vLLM on DeepSeek series models, making us reluctant to abandon sglang. We hope sglang can also support this feature.

Related resources

vllm-project/vllm#10235

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions