[Core][KVConnector] Propagate all tokens on resumed preemptions #24926

QierLi · 2025-09-16T00:30:43Z

Purpose

Propagate the complete token ids to KV Connector, when the request resumes after preempted, where the Prefill token ids are actually [prompt token ids] + [decoded token ids before it was preempted].

I found the necessity to propagate the tokens - aligned with the newly-scheduled requests - when I worked on external prefix caching via V1 KVConnector.

This PR adds an additional resumed_req_token_ids field, and would be None - no-op - for all other cases.

Test Plan

Added a test case for preemption -> resumption of test_priority_scheduling_preemption_and_resumption_when_out_of_kv in test_schedule.py

Test Result

Passed existed and newly-added tests.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-09-16T00:30:51Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request correctly adds functionality to propagate all token IDs for resumed preempted requests when using a KVConnector, which is a valuable improvement for external persistent KV cache implementations. The changes are well-contained and follow the intended logic. However, I've identified a critical issue where a missing else block can lead to an IndexError in common configurations where neither pipeline parallelism nor a KV connector is used. I've provided a suggestion to fix this and make the logic more robust.

robertgshaw2-redhat · 2025-09-16T00:38:54Z

Is this fixing a bug you ran into?

QierLi · 2025-09-16T02:47:41Z

Is this fixing a bug you ran into?

This added signal - only for resumed preemption - is the prerequisite to support external prefix caching efficiently.

WoosukKwon · 2025-09-16T16:57:36Z

vllm/v1/core/sched/output.py

+    # If resumed_from_preemption is True, propogate the token ids to the
+    # connector, otherwise will be empty.
+    token_ids: list[list[int]]


Could you elaborate more why the kv connector needs this information?

Can you rename it to resumed_req_token_ids or something like that?

Empty lists still cost serialization and GC. Can you use None instead?

Sure, token_ids (and the hashes) aligned with the block_ids are required for the build_connector_meta() -> ... -> save_kv_layer() workflow of external prefix caching KVConnector that I've been working on.

For normal requests, token_ids can be extracted from scheduler_output.scheduled_new_reqs[idx].prompt_token_ids.
But for resumed preempted requests, the tokens are currently not propagated. Compared to this simple propagation, persisting the tokens in advance on KVConnector side is not preferred and error prone, as their prefill tokens would be [prompt token ids] + [decoded token ids before it was preempted].

2 & 3 changes applied : ).

vllm/v1/core/sched/scheduler.py

njhill · 2025-09-24T22:44:28Z

@QierLi could you sign off your commits for the DCO?

Would be good to get @WoosukKwon's final approval for this change.

yeqcharlotte · 2025-09-26T19:00:40Z

vllm/v1/core/sched/output.py

            req_ids=[],
            resumed_from_preemption=[],
            new_token_ids=[],
+            resumed_req_token_ids=[],


can you add a proper test case to make sure this is populated correctly?

Sure. Created a new test case of preemption->resumption including this field.

pytest tests/v1/core/test_scheduler.py::test_priority_scheduling_preemption_and_resumption_when_out_of_kv

mergify · 2025-10-03T07:02:48Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @QierLi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Qier Li <[email protected]>

QierLi · 2025-10-05T22:06:55Z

Squashed and rebased. @WoosukKwon Could you review when you get a chance? Thanks : )

mergify · 2025-10-08T05:02:39Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @QierLi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

WoosukKwon

LGTM. @QierLi Can you please rebase? Considering #26385, I think we can always include the whole token ids regardless of whether the connector is used.

Signed-off-by: Qier Li <[email protected]>

…to loader * 'loader' of https://github.com/dsxsteven/vllm_splitPR: (778 commits) [torchao] Add support for ModuleFqnToConfig using regex (vllm-project#26001) Add: Support for multiple hidden layers in Eagle3 (vllm-project#26164) Enable `RMSNorm` substitution for Transformers backend (vllm-project#26353) [Model] Gemma3: Fix GGUF loading and quantization (vllm-project#26189) Bump Flashinfer to v0.4.0 (vllm-project#26326) Update Dockerfile and install runai-model-streamer[gcs] package (vllm-project#26464) [Core] Relax the LoRA max rank (vllm-project#26461) [CI/Build] Fix model nightly tests (vllm-project#26466) [Hybrid]: Decouple Kernel Block Size from KV Page Size (vllm-project#24486) [Core][KVConnector] Propagate all tokens on resumed preemptions (vllm-project#24926) [MM][Doc] Add documentation for configurable mm profiling (vllm-project#26200) [Hardware][AMD] Enable FlexAttention backend on ROCm (vllm-project#26439) [Bugfix] Incorrect another MM data format in vllm bench throughput (vllm-project#26462) [Bugfix] Catch and log invalid token ids in detokenizer #2 (vllm-project#26445) [Minor] Change warning->warning_once in preprocess (vllm-project#26455) [Bugfix] Set the minimum python version for gpt-oss (vllm-project#26392) [Misc] Redact ray runtime env before logging (vllm-project#26302) Separate MLAAttention class from Attention (vllm-project#25103) [Attention] Register FLASHMLA_SPARSE (vllm-project#26441) [Kernels] Modular kernel refactor (vllm-project#24812) ...

…-project#24926) Signed-off-by: Qier Li <[email protected]> Co-authored-by: Qier Li <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…-project#24926) Signed-off-by: Qier Li <[email protected]> Co-authored-by: Qier Li <[email protected]> Signed-off-by: Dhruvil Bhatt <[email protected]>

QierLi requested a review from heheda12345 as a code owner September 16, 2025 00:30

mergify bot added the v1 label Sep 16, 2025

gemini-code-assist bot reviewed Sep 16, 2025

View reviewed changes

heheda12345 mentioned this pull request Sep 16, 2025

[MISC] Add code owners of vllm/v1 to vllm/v1/core #24928

Merged

5 tasks

WoosukKwon reviewed Sep 16, 2025

View reviewed changes

QierLi requested review from alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 16, 2025 20:49

QierLi requested a review from WoosukKwon September 16, 2025 21:09

njhill reviewed Sep 16, 2025

View reviewed changes

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

QierLi requested a review from njhill September 24, 2025 03:45

njhill reviewed Sep 24, 2025

View reviewed changes

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

njhill added the suppress-bc-linter label Sep 24, 2025

QierLi requested a review from ApostaC as a code owner September 25, 2025 07:11

QierLi force-pushed the context_to_kv_connector branch from d3e3e40 to 2b2ec15 Compare September 25, 2025 07:15

yeqcharlotte reviewed Sep 26, 2025

View reviewed changes

QierLi requested a review from yeqcharlotte September 27, 2025 08:40

QierLi changed the title ~~[KVConnector] Propagate all tokens on resumed preemptions~~ [Core][KVConnector] Propagate all tokens on resumed preemptions Sep 30, 2025

mergify bot added the needs-rebase label Oct 3, 2025

QierLi force-pushed the context_to_kv_connector branch from 6e7ceb2 to 519785a Compare October 5, 2025 21:12

QierLi requested review from DarkLight1337, aarnphm and simon-mo as code owners October 5, 2025 21:12

QierLi requested review from ProExpertProg, chaunceyjiang, youkaichao and zou3519 as code owners October 5, 2025 21:12

mergify bot added ci/build frontend gpt-oss Related to GPT-OSS models labels Oct 5, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Oct 5, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Oct 5, 2025

QierLi force-pushed the context_to_kv_connector branch 2 times, most recently from ac9f292 to b796205 Compare October 5, 2025 21:42

mergify bot removed the needs-rebase label Oct 5, 2025

Propagate entrie tokens to connector for resumed preemptions

ed38059

Signed-off-by: Qier Li <[email protected]>

QierLi force-pushed the context_to_kv_connector branch from b796205 to ed38059 Compare October 5, 2025 21:55

Merge branch 'main' into context_to_kv_connector

e0f15d2

mergify bot added the needs-rebase label Oct 8, 2025

WoosukKwon approved these changes Oct 8, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Oct 8, 2025

njhill mentioned this pull request Oct 8, 2025

[BugFix] Fix async scheduling + request preemption #26385

Merged

Merge branch 'main' into context_to_kv_connector

989dd2e

Signed-off-by: Qier Li <[email protected]>

mergify bot removed the needs-rebase label Oct 8, 2025

Merge branch 'main' into context_to_kv_connector

ee06678

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 9, 2025

DarkLight1337 merged commit d17f0fb into vllm-project:main Oct 9, 2025
46 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Oct 9, 2025

Uh oh!

[Core][KVConnector] Propagate all tokens on resumed preemptions #24926

[Core][KVConnector] Propagate all tokens on resumed preemptions #24926

Uh oh!

Conversation

QierLi commented Sep 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

robertgshaw2-redhat commented Sep 16, 2025

Uh oh!

QierLi commented Sep 16, 2025

Uh oh!

WoosukKwon Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

QierLi Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

njhill commented Sep 24, 2025

Uh oh!

yeqcharlotte Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

QierLi Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Oct 3, 2025

Uh oh!

QierLi commented Oct 5, 2025

Uh oh!

mergify bot commented Oct 8, 2025

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

QierLi commented Sep 16, 2025 •

edited by github-actions bot

Loading

QierLi Sep 16, 2025 •

edited

Loading