-
Couldn't load subscription status.
- Fork 185
merge has capacity filter with sheddable filter. #809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge has capacity filter with sheddable filter. #809
Conversation
has capacity only use was for sheddable requests (passthrough for critical ones). Signed-off-by: Nir Rozenbaum <[email protected]>
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
| { | ||
| name: "lowQueueAndLessThanKVCacheThresholdPredicate", | ||
| filter: &HasCapacityFilter{queueThreshold: 0, kvCacheThreshold: 0.8}, | ||
| req: &types.LLMRequest{Critical: false}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you also add a test case on Critical: true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kfswain, nirrozenbaum The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Just a heads up, I delete these files anyways in #805. Capacity decisions should not be a responsibility of the scheduler per the architecture proposal. Admission control (and criticality based service differentiation) should happen outside the scheduler (long term in the flow controller). The scheduler should then decide the optimal pod to route approved requests to. No reason not to submit this though. |
@LukeAVanDrie yeah, sounds good. |
This is unrelated to this PR, but I guess long term, we need to also decide if this separation of responsibilities (specifically, request shedding) is a hard rule or just for the reference implementation. I can see instances where implementers would have custom scheduling plugins that may want to drop requests still. |
Co-authored-by: Cong Liu <[email protected]>
|
/lgtm |
* merge has capacity filter with sheddable filter. has capacity only use was for sheddable requests (passthrough for critical ones). Signed-off-by: Nir Rozenbaum <[email protected]> * Update pkg/epp/scheduling/plugins/filter/filter_test.go Co-authored-by: Cong Liu <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> Co-authored-by: Cong Liu <[email protected]>
* merge has capacity filter with sheddable filter. has capacity only use was for sheddable requests (passthrough for critical ones). Signed-off-by: Nir Rozenbaum <[email protected]> * Update pkg/epp/scheduling/plugins/filter/filter_test.go Co-authored-by: Cong Liu <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> Co-authored-by: Cong Liu <[email protected]>
* merge has capacity filter with sheddable filter. has capacity only use was for sheddable requests (passthrough for critical ones). Signed-off-by: Nir Rozenbaum <[email protected]> * Update pkg/epp/scheduling/plugins/filter/filter_test.go Co-authored-by: Cong Liu <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> Co-authored-by: Cong Liu <[email protected]>
has capacity only use was for sheddable requests (passthrough for critical ones).