-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Bugfix] Fix workspace buffer None issue for Flashinfer TRTLLM Backend #21525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix workspace buffer None issue for Flashinfer TRTLLM Backend #21525
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes a potential None pointer issue with the workspace buffer in the FlashInfer TRTLLM backend by removing a redundant workspace_buffer attribute from FlashInferMetadata and instead sourcing the buffer directly from the decode_wrapper. The change is logical and improves robustness. However, I've identified a critical potential issue where the code now uses _float_workspace_buffer while the older, more established backend implementation uses _int_workspace_buffer for the same function call. This discrepancy needs to be addressed to prevent potential runtime errors or incorrect computations.
40d3415 to
78b341b
Compare
78b341b to
dadc54b
Compare
|
CC @mgoin for review. Thanks. |
23ca680 to
bb27433
Compare
bb27433 to
84e689c
Compare
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and verified locally, thank you!
|
Failures in CI look related |
Signed-off-by: elvischenv <[email protected]>
Signed-off-by: elvischenv <[email protected]>
Head branch was pushed to by a user without write access
84e689c to
bc30deb
Compare
|
Thanks @mgoin, fixed that in the latest commit. There is still one failure but seems to be not related? |
|
Retrying that test now |
vllm-project#21525) Signed-off-by: elvischenv <[email protected]>
vllm-project#21525) Signed-off-by: elvischenv <[email protected]> Signed-off-by: x22x22 <[email protected]>
vllm-project#21525) Signed-off-by: elvischenv <[email protected]>
vllm-project#21525) Signed-off-by: elvischenv <[email protected]>
vllm-project#21525) Signed-off-by: elvischenv <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
vllm-project#21525) Signed-off-by: elvischenv <[email protected]> Signed-off-by: Noam Gat <[email protected]>
vllm-project#21525) Signed-off-by: elvischenv <[email protected]> Signed-off-by: Paul Pak <[email protected]>
vllm-project#21525) Signed-off-by: elvischenv <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
vllm-project#21525) Signed-off-by: elvischenv <[email protected]>
vllm-project#21525) Signed-off-by: elvischenv <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
There is an always valid
workspace_bufferinitialized inattn_metadata.decode_wrapper._float_workspace_buffer, so no need to store the extra oneattn_metadata.workspace_buffer. They should be the same.Original PR(#19825) initialize
attn_metadatavia the following way,self._workspace_bufferwill be None at the beginning, and will be allocated after ONE round ofself._plan(). So it will only get the correct assignment at the 2nd round ofbuild().There is one PR(#21137) likely fixed the issue a while ago, but I think it is better to just remove this duplicated attribute.
Also fixed unit test for the breakage API for #21485.
Test Plan
tests/kernels/attention/test_flashinfer_trtllm_decode_attention.pylm_eavlTest Result
(Optional) Documentation Update