-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling #27145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling #27145
Conversation
Signed-off-by: wang.yuqi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables chunked prefill for ALL pooling tasks, which is a great step towards improving performance for long prompts. The changes involve modifying the AllPool layer to cache hidden states during chunked prefill and updating related metadata and data structures. A new test is also added to validate this functionality.
I've found a couple of critical issues that need to be addressed. One is a mutable default argument in PoolingParams which can lead to shared state and incorrect behavior. The other is an incorrect return type in is_partial_prefill which will cause runtime errors. Please see my detailed comments for suggestions on how to fix these.
Signed-off-by: wang.yuqi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
/gemini review |
Signed-off-by: wang.yuqi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for chunked prefill with ALL pooling, which is a valuable enhancement. The core logic in AllPool is updated to cache hidden states across chunks and process them upon completion of prefill. A corresponding test case has been added to validate this new functionality. While the overall approach is sound, I've identified a few critical issues. Notably, a change to is_partial_prefill appears to break other pooling methods. There's also a potential memory leak and correctness bug in AllPool due to the hidden states cache not being cleared. Lastly, a minor typing issue in a method could lead to unexpected behavior. Please see my detailed comments for suggestions on how to address these points.
Signed-off-by: wang.yuqi <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables chunked prefill for ALL pooling tasks, which was previously unsupported. The changes involve modifying the AllPool layer to cache hidden states across multiple prefill steps and return the complete hidden states only when a request is finished. This is a good step towards more flexible pooling operations. The related metadata and worker logic have been updated accordingly, and a new test case is added to verify the functionality. My review focuses on improving code clarity and maintainability.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <[email protected]>
|
/gemini review |
|
cc @DarkLight1337 @maxdebayser Ready for review I came up with a simple method to support chunked prefill with ALL pooling, without modifying too much core code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for chunked prefill with ALL pooling tasks, which was previously unsupported. The changes are well-structured and involve modifying the AllPool layer to cache hidden states across prefill chunks, updating PoolingParams and PoolingMetadata to support this, and adjusting the GPUModelRunner to handle the new logic. A new test case is also added to validate the implementation.
My main feedback is on a design choice in PoolingParams. I've left a comment with a suggestion for a more robust implementation that better separates data transfer objects from runtime state. Overall, this is a good step towards more flexible and efficient pooling operations.
tests/models/language/pooling/test_all_pooling_plus_chunked_prefill.py
Outdated
Show resolved
Hide resolved
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully enables chunked prefill for ALL pooling tasks, which was previously unsupported. This is a valuable enhancement for models utilizing token_embed or token_classify. The implementation, including the stateful handling of hidden states in AllPool and updates to related data structures, appears correct and well-integrated. The tests have been appropriately updated and new ones added to validate this new capability. I've identified one minor issue in a new test case where an assertion doesn't seem to match its intended purpose. Other than that, this is a solid contribution.
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
Improve all pooling task
These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?
Purpose
Address:
Let's support chunked prefill with ALL pooling
We are moving further towards supporting both generation and Returning Prompt Hidden States simultaneously
Test Plan
tests/models/language/pooling/test_all_pooling_plus_chunked_prefill.py
Test Result
pass
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.