[Feature] Add support to concurrently calculate prefills of multiple requests

### Checklist

- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 2. Please use English, otherwise it will be closed.

### Motivation

We have a problem where high volumes of small-prompt requests are usually processed smoothly, but quickly pile up in a giant queue when a small number of large-prompt requests are submitted.We attempted to use the --enable-mixed-chunk parameter to enable sglang to handle both prefill and decode concurrently. However, sglang only processes a single sequence during prefill, causing short-text requests to remain blocked by long-text requests. 
We also observed that vLLM allows concurrent processing of multiple requests in the prefill phase through the --max-num-partial-prefills parameter. Nonetheless, sglang demonstrably outperforms vLLM on DeepSeek series models, making us reluctant to abandon sglang. We hope sglang can also support this feature.

### Related resources

https://github.com/vllm-project/vllm/pull/10235

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Add support to concurrently calculate prefills of multiple requests #8764

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Add support to concurrently calculate prefills of multiple requests #8764

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions