Skip to content

Conversation

@wallashss
Copy link
Collaborator

@wallashss wallashss commented Apr 29, 2025

This PR primarily integrate the model runner with the new spyre input batch for continuous batching.

Summary:

  • Removed some hardcoded code to related to sampling parameters. Now, it should work fine requests with sampling parameters with CB.
  • Update the logic in warm up to properly clean the data of the model runner, so it can have a consistent state to receive new requests.
  • Now we have two input batches: one for the decoding and other for the prefill. Currently, we can only prefill one request at a time, and the batch is reset for every new prefill. This is similar to how the static batching works. This solution also ease to split the requests, because we only have to swap the batch to process without the need to activate/deactivate requests that are in decoding step.

For #76

@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@wallashss wallashss marked this pull request as draft April 29, 2025 23:06
@wallashss
Copy link
Collaborator Author

wallashss commented Apr 29, 2025

This should be follow up for #126 and #107

@wallashss
Copy link
Collaborator Author

Currently, it pass in the tests of #79

wallashss and others added 14 commits April 30, 2025 09:49
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
@wallashss wallashss force-pushed the wallas-refact-cb-2 branch from 2bd358f to 1a143d1 Compare May 9, 2025 13:52
Signed-off-by: Wallas Santos <[email protected]>
Comment on lines +365 to 368
self.execute_model(scheduler_output)

self.model_runner.tkv = 0 # type: ignore[union-attr]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yannicks1, considering that the previous execute_model should correctly clean up the request of the model runner do we still need to reset the tkv?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering that the previous execute_model should correctly clean up the request of the model runner

does it reset the tkv too? @wallashss

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, no. I kept the reset by the worker. I should be fine.

@wallashss wallashss changed the title [CB] WIP: Update spyre model runner for new spyre input batch [CB] Update spyre model runner for new spyre input batch May 12, 2025
@wallashss wallashss marked this pull request as ready for review May 12, 2025 15:02
wallashss added 4 commits May 13, 2025 09:58
fix: tests for cb
feat: assert to prevent activate cb with batch_size=1

Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
wallashss added 2 commits May 13, 2025 13:37
Signed-off-by: Wallas Santos <[email protected]>
Copy link
Collaborator

@prashantgupta24 prashantgupta24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

it seems like this is fixed, it wasn't working earlier with engine core's step function

Signed-off-by: Prashant Gupta <[email protected]>
@wallashss wallashss merged commit 604149d into main May 13, 2025
17 checks passed
@wallashss wallashss deleted the wallas-refact-cb-2 branch May 13, 2025 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants