- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 11k
 
[BugFix] Make FlashInferMetadataBuilder non-blocking #25040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Julien Lin <[email protected]>
| 
           @benchislett Please check if this fix is correct. Thanks!  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me, but @LucasWilkinson or @benchislett should validate before merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the contribution!
) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]>
) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: charlifu <[email protected]>
Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: yewentao256 <[email protected]>
) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]>
) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]>
) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Purpose
The blocking H2D memcpys breaks overlap scheduler #23569, setting them to non-blocking fixes it.
The correctness is ensured by vllm/v1/worker/gpu_model_runner.py:2112
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.