[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling #27145

noooop · 2025-10-18T07:47:44Z

Improve all pooling task

[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. #25524
[Model] Add num_cached_tokens for PoolingRequestOutput #27378
[Model] Allow users to control skip reading cache per request. #28194
Improve enable chunked_prefill & prefix_caching logic. #26623
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling #27145
[Model][8/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API #26686
[Doc][Last/N] Improve all pooling task | Refactor pooling-related documentation #27963

These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?

Purpose

Address:

Let's support chunked prefill with ALL pooling

We are moving further towards supporting both generation and Returning Prompt Hidden States simultaneously

Test Plan

tests/models/language/pooling/test_all_pooling_plus_chunked_prefill.py

Test Result

pass

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: wang.yuqi <[email protected]>

gemini-code-assist

Code Review

This pull request enables chunked prefill for ALL pooling tasks, which is a great step towards improving performance for long prompts. The changes involve modifying the AllPool layer to cache hidden states during chunked prefill and updating related metadata and data structures. A new test is also added to validate this functionality.

I've found a couple of critical issues that need to be addressed. One is a mutable default argument in PoolingParams which can lead to shared state and incorrect behavior. The other is an incorrect return type in is_partial_prefill which will cause runtime errors. Please see my detailed comments for suggestions on how to fix these.

vllm/pooling_params.py

vllm/v1/pool/metadata.py

Signed-off-by: wang.yuqi <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/v1/pool/metadata.py

vllm/pooling_params.py

noooop · 2025-10-18T07:51:23Z

/gemini review

Signed-off-by: wang.yuqi <[email protected]>

gemini-code-assist

Code Review

This pull request introduces support for chunked prefill with ALL pooling, which is a valuable enhancement. The core logic in AllPool is updated to cache hidden states across chunks and process them upon completion of prefill. A corresponding test case has been added to validate this new functionality. While the overall approach is sound, I've identified a few critical issues. Notably, a change to is_partial_prefill appears to break other pooling methods. There's also a potential memory leak and correctness bug in AllPool due to the hidden states cache not being cleared. Lastly, a minor typing issue in a method could lead to unexpected behavior. Please see my detailed comments for suggestions on how to address these points.

vllm/model_executor/layers/pooler.py

vllm/v1/pool/metadata.py

Signed-off-by: wang.yuqi <[email protected]>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <[email protected]>

noooop · 2025-10-18T07:59:03Z

/gemini review

gemini-code-assist

Code Review

This pull request enables chunked prefill for ALL pooling tasks, which was previously unsupported. The changes involve modifying the AllPool layer to cache hidden states across multiple prefill steps and return the complete hidden states only when a request is finished. This is a good step towards more flexible pooling operations. The related metadata and worker logic have been updated accordingly, and a new test case is added to verify the functionality. My review focuses on improving code clarity and maintainability.

vllm/model_executor/layers/pooler.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <[email protected]>

noooop · 2025-10-18T08:02:16Z

/gemini review

noooop · 2025-10-18T08:04:09Z

cc @DarkLight1337 @maxdebayser

Ready for review

I came up with a simple method to support chunked prefill with ALL pooling, without modifying too much core code.

gemini-code-assist

Code Review

This pull request introduces support for chunked prefill with ALL pooling tasks, which was previously unsupported. The changes are well-structured and involve modifying the AllPool layer to cache hidden states across prefill chunks, updating PoolingParams and PoolingMetadata to support this, and adjusting the GPUModelRunner to handle the new logic. A new test case is also added to validate the implementation.

My main feedback is on a design choice in PoolingParams. I've left a comment with a suggestion for a more robust implementation that better separates data transfer objects from runtime state. Overall, this is a good step towards more flexible and efficient pooling operations.

vllm/pooling_params.py

tests/models/language/pooling/test_all_pooling_plus_chunked_prefill.py

Signed-off-by: wang.yuqi <[email protected]>

noooop · 2025-10-18T11:46:37Z

/gemini review

gemini-code-assist

Code Review

This pull request successfully enables chunked prefill for ALL pooling tasks, which was previously unsupported. This is a valuable enhancement for models utilizing token_embed or token_classify. The implementation, including the stateful handling of hidden states in AllPool and updates to related data structures, appears correct and well-integrated. The tests have been appropriately updated and new ones added to validate this new capability. I've identified one minor issue in a new test case where an assertion doesn't seem to match its intended purpose. Other than that, this is a solid contribution.

tests/models/language/pooling/test_extract_hidden_states.py

mergify · 2025-10-27T14:38:47Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @noooop.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: wang.yuqi <[email protected]>

mergify · 2025-11-11T02:06:56Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @noooop.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: wang.yuqi <[email protected]>

mergify · 2025-11-28T06:33:06Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @noooop.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Support chunked prefill with ALL pooling

86c0f38

Signed-off-by: wang.yuqi <[email protected]>

mergify bot added the v1 label Oct 18, 2025

gemini-code-assist bot reviewed Oct 18, 2025

View reviewed changes

vllm/pooling_params.py Outdated Show resolved Hide resolved

vllm/v1/pool/metadata.py Outdated Show resolved Hide resolved

fix

6bd49f2

Signed-off-by: wang.yuqi <[email protected]>

chatgpt-codex-connector bot reviewed Oct 18, 2025

View reviewed changes

vllm/v1/pool/metadata.py Show resolved Hide resolved

vllm/pooling_params.py Show resolved Hide resolved

fix

44c6ee1

Signed-off-by: wang.yuqi <[email protected]>

gemini-code-assist bot reviewed Oct 18, 2025

View reviewed changes

vllm/model_executor/layers/pooler.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/pooler.py Outdated Show resolved Hide resolved

vllm/v1/pool/metadata.py Outdated Show resolved Hide resolved

fix

86f0868

Signed-off-by: wang.yuqi <[email protected]>

noooop changed the title ~~[Model][5/N] Improve all pooling task | Support chunked prefill with ALL pooling~~ [Model][3/N] Improve all pooling task | Support chunked prefill with ALL pooling Oct 18, 2025

noooop and others added 2 commits October 18, 2025 15:58

Update vllm/model_executor/layers/pooler.py

7c1d68d

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <[email protected]>

Update vllm/model_executor/layers/pooler.py

f903415

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <[email protected]>

gemini-code-assist bot reviewed Oct 18, 2025

View reviewed changes

vllm/model_executor/layers/pooler.py Outdated Show resolved Hide resolved

Update vllm/model_executor/layers/pooler.py

72df85d

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <[email protected]>

gemini-code-assist bot reviewed Oct 18, 2025

View reviewed changes

vllm/pooling_params.py Outdated Show resolved Hide resolved

noooop commented Oct 18, 2025

View reviewed changes

tests/models/language/pooling/test_all_pooling_plus_chunked_prefill.py Outdated Show resolved Hide resolved

fix deep copy

6b6e7a8

Signed-off-by: wang.yuqi <[email protected]>

noooop added 2 commits October 18, 2025 16:35

fix

9aef354

Signed-off-by: wang.yuqi <[email protected]>

+ tests

d574b6c

Signed-off-by: wang.yuqi <[email protected]>

Merge branch 'main' into all_pooling_plus_chunked_prefill2

26351d7

gemini-code-assist bot reviewed Oct 18, 2025

View reviewed changes

tests/models/language/pooling/test_extract_hidden_states.py Outdated Show resolved Hide resolved

noooop requested review from ApostaC, WoosukKwon, alexm-redhat, comaniac, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners October 21, 2025 01:36

Merge branch 'main' into all_pooling_plus_chunked_prefill2

08a0739

noooop changed the title ~~[Model][4/N] Improve all pooling task | Support chunked prefill with ALL pooling~~ [Model][5/N] Improve all pooling task | Support chunked prefill with ALL pooling Oct 22, 2025

mergify bot added the needs-rebase label Oct 27, 2025

Merge branch 'main' into all_pooling_plus_chunked_prefill2

29b3d1d

Signed-off-by: wang.yuqi <[email protected]>

mergify bot removed the needs-rebase label Oct 28, 2025

noooop added 2 commits October 28, 2025 13:49

update

eea5f6c

Signed-off-by: wang.yuqi <[email protected]>

update

41ff486

Signed-off-by: wang.yuqi <[email protected]>

noooop changed the title ~~[Model][5/N] Improve all pooling task | Support chunked prefill with ALL pooling~~ [Model][-/N] Improve all pooling task | Support chunked prefill with ALL pooling Oct 28, 2025

This was referenced Nov 3, 2025

[Doc][Last/N] Improve all pooling task | Refactor pooling-related documentation #27963

Draft

[Model] Allow users to control skip reading cache per request. #28194

Merged

mergify bot added the needs-rebase label Nov 11, 2025

noooop changed the title ~~[Model][-/N] Improve all pooling task | Support chunked prefill with ALL pooling~~ [Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling Nov 11, 2025

Merge branch 'main' into all_pooling_plus_chunked_prefill2

70c0965

Signed-off-by: wang.yuqi <[email protected]>

mergify bot removed the needs-rebase label Nov 17, 2025

noooop added 3 commits November 17, 2025 11:39

update

e8f222e

Signed-off-by: wang.yuqi <[email protected]>

Merge branch 'main' into all_pooling_plus_chunked_prefill2

d78b2cf

update

fb8197b

Signed-off-by: wang.yuqi <[email protected]>

noooop mentioned this pull request Nov 21, 2025

Improve enable chunked_prefill & prefix_caching logic. #26623

Merged

5 tasks

mergify bot added the needs-rebase label Nov 28, 2025

Uh oh!

[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling #27145

Are you sure you want to change the base?

[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling #27145

Conversation

noooop commented Oct 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Improve all pooling task

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

noooop commented Oct 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noooop commented Oct 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

noooop commented Oct 18, 2025

Uh oh!

noooop commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

noooop commented Oct 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Oct 27, 2025

Uh oh!

mergify bot commented Nov 11, 2025

Uh oh!

mergify bot commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

noooop commented Oct 18, 2025 •

edited by github-actions bot

Loading

noooop commented Oct 18, 2025 •

edited

Loading