-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[BugFix] Fix async scheduling + request preemption #26385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is an attempt at a minimally invasive quick fix. I'm working on a better fix which will also address the penalties sampling parameter incompatibility. Signed-off-by: Nick Hill <[email protected]>
24457e0
to
4ce9ae4
Compare
@WoosukKwon so I think we can piggy-back on #24926 for the scheduler side of this? Then the remaining changes here would be even smaller... |
This pull request has merge conflicts that must be resolved before it can be |
# Conflicts: # vllm/v1/core/sched/output.py
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
@WoosukKwon the change becomes even smaller after rebasing on #24926. It includes another simplification I noticed that I've opened a separate PR for: #26508. I will aim to add a test next. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks so much for the fix! It'd be nice if we can have a unit test to prevent this happening again.
…c-preempt # Conflicts: # vllm/v1/worker/gpu_model_runner.py
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
bee9aa2
to
c3eb64b
Compare
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Dhruvil Bhatt <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: bbartels <[email protected]>
Ensure model runner is refreshed with all request token ids following preemption, so that correct input ids are used.