Skip to content

Conversation

nirrozenbaum
Copy link
Contributor

@nirrozenbaum nirrozenbaum commented May 22, 2025

This is an initial PR that lays the ground for a multi cycle scheduler.
we have an ongoing open discussions in #845 and on google docs and this is not intended to serve as a replacement for the discussions.

This PR is not covering all the planned changes that were discussed, but takes a big step towards the end goal.

in the scope of this PR:

  • restructuring and moving some parts (like plugins, etc) under /scheduling/framework.
  • renaming Pre/Post Schedule to Pre/Post Cycle to avoid confusion when doing a multi cycle scheduling.
  • renaming the previously called SchedulerConfig to SchedulerProfile. the Profile includes the extensions.
  • introducing profiles-picker extension. upon receiving a new schedule request, the profile-picker will pick iteratively which profiles to run. it may pick all or just a subset. this PR includes the simplest form of profile picking and implements a plugin that always selects all profiles.
  • introducing a new SchedulerConfig that now includes the registered profiles (map from profile name(string) to the profile config) and the profiles-picker.

out of scope:

  • any struct rename other than Pre/Post Cycle.
  • Pre/Post Cycle should be removed as agreed in other threads. atm the only plugin that uses those is prefix. should first change prefix implementation not to use these extension points and than it can be removed.
  • this PR doesn't handle the PostResponse (Or my preferred name PostDispatch) part. therefore it's still part of the scheduler, but it should absolutely be removed and moved to a higher layer.
  • does not implement any results aggregation. I think this is an open discussion on where this should be handled (inside or outside of the scheduler). current PR returns map from profile -> result and therefore keep the existing behavior. this can be addressed in follow up PRs.

summary:

additional changes may be needed as we continue to shape how we want the scheduler to look like.
as a first step, this implements many of the points we already agree on, enables the removal of P/D scheduler from llm-d, and allows configuring all llm-d scheduler related behavior (including conditional profile picking) through plugins only.

Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Nir Rozenbaum <[email protected]>
Copy link

netlify bot commented May 22, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit d658430
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/682ee20a9f28620008adec17
😎 Deploy Preview https://deploy-preview-862--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot requested review from Jeffwan and kfswain May 22, 2025 06:47
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 22, 2025
@nirrozenbaum
Copy link
Contributor Author

cc @ahg-g @kfswain

Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Nir Rozenbaum <[email protected]>
@kfswain
Copy link
Collaborator

kfswain commented May 22, 2025

/hold

I was tempted to close this. But I will hold for now. Scheduler refactors should not move forward until: https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/845/files is merged

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 22, 2025
targetPod := results[0].TargetPod.GetPod()
var targetPod *backend.Pod
// TODO should handle multi cycle results, this should be pluggable logic
for _, result := range results {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will just set the final targetPod. Can we instead just index on the key & get the result that way? We can leave the todo to indicate this is a transitory state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean doing this?

targetPod = results[key].TargetPod.GetPod()

instead of:

targetPod = result.TargetPod.GetPod()

not sure I got the intention. we have here a map from profile-name -> result.
this is a transitionary stage where only one profile is used, so only one result.
but since it's a map I must use the range loop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just do targetPod = results[key].TargetPod.GetPod() and drop the for loop.

Since we only end up with one result as is even if there are multiple profiles, it reads more obvious if we just use the key, I think.

Copy link
Contributor Author

@nirrozenbaum nirrozenbaum May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so now the next question, what is the value of key in this line

targetPod = results[key].TargetPod.GetPod()?

which key value to use?
if we used default configuration it's just "default", but if we used schedulerv2, it's a different profile.
since we don't know the key, I'm using the loop...

Copy link
Contributor Author

@nirrozenbaum nirrozenbaum May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this to be solved as soon as we implement the extension point

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only one we set is schedulerv2 right?
line 215 in the main file of this PR

Copy link
Collaborator

@kfswain kfswain May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this may be a nit, but the for loop would look out of place since the iteration isn't quite being used

Copy link
Contributor Author

@nirrozenbaum nirrozenbaum May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we used the env var to enable "schedulerv2" than the key is "schedulerv2".
if not and we use default configuration, the key is "default" (in NewScheduler func):
https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/scheduling/scheduler.go#L73

this is why I didn't put here a const.

@kfswain
Copy link
Collaborator

kfswain commented May 22, 2025

mostly I think this is fine, a lot of this will be replaced as we most closely align with: #845

small comment to keep the director clean for now.

@nirrozenbaum
Copy link
Contributor Author

mostly I think this is fine, a lot of this will be replaced as we most closely align with: #845

small comment to keep the director clean for now.

yes, agreed.
this PR that takes a step towards the end goal, but is subject to change based on the discussion
(hopefully it would be mostly interface changes and adaptations).

@kfswain
Copy link
Collaborator

kfswain commented May 23, 2025

/lgtm
/approve
/hold

Holding submission for author to choose to address comments

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 23, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 23, 2025
@nirrozenbaum
Copy link
Contributor Author

unholding to avoid conflicts with other PRs since this PR touches 30 files.
we can iterate over the last nit in a tightly scoped follow up PR.

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 23, 2025
@k8s-ci-robot k8s-ci-robot merged commit 5bc7425 into kubernetes-sigs:main May 23, 2025
7 of 8 checks passed
@nirrozenbaum nirrozenbaum deleted the multi-cycle-scheduler branch May 23, 2025 09:07
irar2 pushed a commit to irar2/gateway-api-inference-extension that referenced this pull request Jun 3, 2025
* code review

Signed-off-by: Nir Rozenbaum <[email protected]>

* minor change

Signed-off-by: Nir Rozenbaum <[email protected]>

* add support for multi cycle scheduling

Signed-off-by: Nir Rozenbaum <[email protected]>

* minor change

Signed-off-by: Nir Rozenbaum <[email protected]>

* moved plugins under plugins dir

Signed-off-by: Nir Rozenbaum <[email protected]>

* few more changes

Signed-off-by: Nir Rozenbaum <[email protected]>

* moved RunCycle logic into SchedulerProfile

Signed-off-by: Nir Rozenbaum <[email protected]>

* minor changes

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

* minor change in unit-test

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request Jun 11, 2025
* code review

Signed-off-by: Nir Rozenbaum <[email protected]>

* minor change

Signed-off-by: Nir Rozenbaum <[email protected]>

* add support for multi cycle scheduling

Signed-off-by: Nir Rozenbaum <[email protected]>

* minor change

Signed-off-by: Nir Rozenbaum <[email protected]>

* moved plugins under plugins dir

Signed-off-by: Nir Rozenbaum <[email protected]>

* few more changes

Signed-off-by: Nir Rozenbaum <[email protected]>

* moved RunCycle logic into SchedulerProfile

Signed-off-by: Nir Rozenbaum <[email protected]>

* minor changes

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

* minor change in unit-test

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants