Skip to content

Conversation

@nirrozenbaum
Copy link
Contributor

has capacity only use was for sheddable requests (passthrough for critical ones).

has capacity only use was for sheddable requests (passthrough for critical ones).

Signed-off-by: Nir Rozenbaum <[email protected]>
@netlify
Copy link

netlify bot commented May 9, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 17b1c81
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/681e5a0577a27100087ba26e
😎 Deploy Preview https://deploy-preview-809--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 9, 2025
@k8s-ci-robot k8s-ci-robot requested review from Jeffwan and robscott May 9, 2025 05:48
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 9, 2025
@nirrozenbaum
Copy link
Contributor Author

cc @liu-cong @ahg-g

{
name: "lowQueueAndLessThanKVCacheThresholdPredicate",
filter: &HasCapacityFilter{queueThreshold: 0, kvCacheThreshold: 0.8},
req: &types.LLMRequest{Critical: false},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you also add a test case on Critical: true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is maybe not worth the effort considering #808 --> #805.

@kfswain
Copy link
Collaborator

kfswain commented May 9, 2025

/approve
will leave lgtm to others

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 9, 2025
@LukeAVanDrie
Copy link
Contributor

LukeAVanDrie commented May 9, 2025

Just a heads up, I delete these files anyways in #805. Capacity decisions should not be a responsibility of the scheduler per the architecture proposal. Admission control (and criticality based service differentiation) should happen outside the scheduler (long term in the flow controller). The scheduler should then decide the optimal pod to route approved requests to. No reason not to submit this though.

@nirrozenbaum
Copy link
Contributor Author

Just a heads up, I delete these files anyways in #805. Capacity decisions should not be a responsibility of the scheduler per the architecture proposal. Admission control (and criticality based service differentiation) should happen outside the scheduler (long term in the flow controller). The scheduler should then decide the optimal pod to route approved requests to. No reason not to submit this though.

@LukeAVanDrie yeah, sounds good.
Current PR doesn’t change anything but merging two filters that are actually the same (one is calling the other). once criticality handling is done in flow control this can be removed

@LukeAVanDrie
Copy link
Contributor

LukeAVanDrie commented May 9, 2025

once criticality handling is done in flow control this can be removed

This is unrelated to this PR, but I guess long term, we need to also decide if this separation of responsibilities (specifically, request shedding) is a hard rule or just for the reference implementation. I can see instances where implementers would have custom scheduling plugins that may want to drop requests still.

@liu-cong
Copy link
Contributor

liu-cong commented May 9, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 9, 2025
@k8s-ci-robot k8s-ci-robot merged commit 2dce3ea into kubernetes-sigs:main May 9, 2025
8 checks passed
@nirrozenbaum nirrozenbaum deleted the sheddable-filter branch May 9, 2025 20:28
nayihz pushed a commit to nayihz/gateway-api-inference-extension that referenced this pull request May 14, 2025
* merge has capacity filter with sheddable filter.

has capacity only use was for sheddable requests (passthrough for critical ones).

Signed-off-by: Nir Rozenbaum <[email protected]>

* Update pkg/epp/scheduling/plugins/filter/filter_test.go

Co-authored-by: Cong Liu <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Co-authored-by: Cong Liu <[email protected]>
kaushikmitr pushed a commit to kaushikmitr/llm-instance-gateway that referenced this pull request May 15, 2025
* merge has capacity filter with sheddable filter.

has capacity only use was for sheddable requests (passthrough for critical ones).

Signed-off-by: Nir Rozenbaum <[email protected]>

* Update pkg/epp/scheduling/plugins/filter/filter_test.go

Co-authored-by: Cong Liu <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Co-authored-by: Cong Liu <[email protected]>
rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request Jun 11, 2025
* merge has capacity filter with sheddable filter.

has capacity only use was for sheddable requests (passthrough for critical ones).

Signed-off-by: Nir Rozenbaum <[email protected]>

* Update pkg/epp/scheduling/plugins/filter/filter_test.go

Co-authored-by: Cong Liu <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Co-authored-by: Cong Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants