Skip to content

Conversation

nirrozenbaum
Copy link
Contributor

@nirrozenbaum nirrozenbaum commented Jun 3, 2025

This PR moves PostResponse plugins out of scheduler and into requestcontrol layer (more specifically into the director).

several code changes were done as part of this change:

  • moved the creation of director to main.go in order to allow easy setup of plugins as we have with scheduler. a plugin should be able to register to both scheduler plugin and post response plugin in main.
  • added PostResponsePlugins slice in director and added the WithPostResponsePlugins function to allow easy setup of the PostResponsePlugins.
  • PostResponse plugin now gets both the request and the response - this is very useful in prefix plugin in order to be able to concat the prompt with the response and save in prefix cache.
  • unit + integration tests were updated accordingly.
  • example how to register PostResponsePlugins can be seen here:

in a follow up PR, we should update the prefix plugin to use PostResponse instead of PostCycle extension point and then remove completely the PostCycle.

Copy link

netlify bot commented Jun 3, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 19db87d
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6840808a43d6960007985a5e
😎 Deploy Preview https://deploy-preview-914--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 3, 2025
@k8s-ci-robot k8s-ci-robot requested review from danehans and liu-cong June 3, 2025 12:15
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 3, 2025
Signed-off-by: Nir Rozenbaum <[email protected]>
@nirrozenbaum
Copy link
Contributor Author

cc: @ahg-g @kfswain

func NewDirector(datastore datastore.Datastore, scheduler Scheduler, saturationDetector SaturationDetector) *Director {
return &Director{datastore, scheduler, saturationDetector}
// WithPostResponsePlugins sets the given plugins as the PostResponse plugins.
// If the Director has PostResponse plugins already, this call replaces the existing plugins with the given ones.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to the list might be more appropriate?

Copy link
Contributor Author

@nirrozenbaum nirrozenbaum Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally left it out of the list, since this is an optional field.
I would like to avoid creation of empty slice (or using nil) when caller doesn't need any PostResponsePlugins.
If I add it as arg the code will look like:

director := requestcontrol.NewDirector(datastore, scheduler, detector, []PostResponsePlugin{})

OR

director := requestcontrol.NewDirector(datastore, scheduler, detector, nil)

on the other hand, since this field is optional, it is possible to initialize detector with or without it like this -
without:

director := requestcontrol.NewDirector(datastore, scheduler, detector)

with:

director := requestcontrol.NewDirector(datastore, scheduler, detector).
    WithPostResponsePlugins(plugin1, plugin2, ...)

the latter gets also the same feeling of the Scheduler plugins.

// Headers is a map of the response headers. Nil during body processing
Headers map[string]string
// Body Is the body of the response or nil during header processing
Body map[string]string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why is body a map and not a string (or array of strings, when streaming)?
Is it the parsed JSON?
If the post response plugin needs to know the model server (target Pod) selected, how is it communicated? Should it be stored in a previous callback and used here (e.g., map of request-id to selected pod)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. this was originally a string. I tested something locally and forgot to put back as string.
fixed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the post response plugin needs to know the model server (target Pod) selected, how is it communicated? Should it be stored in a previous callback and used here (e.g., map of request-id to selected pod)?

This is stored in a struct called RequestContext that gets filled during the lifecycle of a request.
https://github.com/nirrozenbaum/gateway-api-inference-extension/blob/post-response/pkg/epp/requestcontrol/director.go#L206

scorers: []*WeightedScorer{},
postCyclePlugins: []PostCycle{},
PostResponsePlugins: []PostResponse{},
filters: []Filter{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: realize this is a copy over, we should be consistent in naming (e.g., filter vs filterPlugin on the one hand and postCyclePlugins on the other)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once PostResponse is out of scheduler, the next PR will be to remove PostCycle completely (there is no use case for that). we should be left with filters, scorers and picker here. so this is a temp state which should be resolved very soon.

package requestcontrol

// Response contains information from the response received to be passed to PostResponse plugins
type Response struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how different than that defined in pkg/epp/scheduling/types/types.go?
Should we converge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently it is not different.
I'm not sure if we should converge or keep them separate so each can evolve according to the package requirements.
in this PR the focus was solely on moving PostResponse out of Scheduler and into the requestcontrol layer. if we think it's better to converge Plugin interface I suggest to do it in a follow up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are mixising the schedulingtypes.LLMRequest and requescontrol.Response, I think we should at least be consistent.

I don't see a good reason why they need to differ. I think we can just consolidate them to a shared Request/Response objects, perhaps at the top level. If for any reason we need scheduler/director specific metadata they can be added to scheduler/director specific structs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the main point here:
once PostResponse is out of Scheduler (I think we all agree it should), scheduler shouldn't care about anything related to response, and thus response is in the scope of requestcontrol.

on the other hand, in order to do a schedule call, one needs to provide the scheduler representation of the request.
(this is the current scheduling.LLMRequest).

in PostResponse we need a representation of the request that was scheduled, ideally after Prompt and other request properties have been unmarshaled, and the response.

the unmarshalled object we have for the request is the Scheduler request. I can also switch to using requestcontrol.Request but that would introduce another duplication of fields, so I preferred avoiding it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me now multiple packages need to access some common request/response structs, and it makes sense to create a common types package containing request/response. In the case of a specific package needing to extend from the common types, e.g., in scheduling, you can do

package scheduling

type Request struct {
    commontypes.Request
    additionalFields
}

But I am OK with deferring this.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jun 3, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jun 3, 2025
@liu-cong
Copy link
Contributor

liu-cong commented Jun 3, 2025

in a follow up PR, we should update the prefix plugin to use PostResponse instead of PostCycle extension point and then remove completely the PostCycle.

Can you raise an issue to discuss this? I will comment on some use cases that PostCycle/PreRequest is still needed.

@nirrozenbaum
Copy link
Contributor Author

nirrozenbaum commented Jun 3, 2025

in a follow up PR, we should update the prefix plugin to use PostResponse instead of PostCycle extension point and then remove completely the PostCycle.

Can you raise an issue to discuss this? I will comment on some use cases that PostCycle/PreRequest is still needed.

@liu-cong sure. we do have a PR where we discuss requirements - #905.
the discussions started on #845 and none of us could find a use case for pre/post which resulted in the conclusion that we might remove it if there is no use case.
feel free to add your comments and of course if there is a use case for Pre/Post Cycle then we should keep it.

this PR handles only the moving of PostResponse out of scheduler and into the requestcontrol layer.
so the discussion on whether we should keep Pre/Post cycle in scheduler is a different issue and can be captured in #905.

Copy link
Contributor

@liu-cong liu-cong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple of nits, otherwise lgtm


saturationDetector := saturationdetector.NewDetector(sdConfig, datastore, ctrl.Log)

director := requestcontrol.NewDirector(datastore, scheduler, saturationDetector) // can call "director.WithPostResponsePlugins" to add post response plugins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I prefer adding WithPostResponsePlugins as an "Option" object and add that as an optinal argument to the NewDirector(). This is more discoverable and remove the need of this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't really need this comment :)
this comment was added just until we add the WithPostResponse usage in main.go.

I implemented it this way to keep it consistent with how scheduler plugins are defined.
In general, both patterns are commonly used in go and more specifically in Kubernetes, but personally I prefer using the With... approach which reads clearer to me and also allows adding it only when used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option pattern is optional as well, you only use when you need it. But it's more discoverable in the function signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option pattern is optional as well, you only use when you need it.

right. I was not trying to say otherwise :).

was just making the point that both are common patterns that are widely used in the community, and personally I prefer the With.. approach, which is also aligned with what was done in Scheduler. so it keeps the plugins setup consistent across the layers.

RequestRunning bool
Request *Request

SchedulingRequest *schedulingtypes.LLMRequest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many duplicated fields in LLMRequest and RequestContext. Initially the LLMRequest was scoped to the scheduling package only.

Can we move LLMRequest out of scheduling package now it has wider scope? And consolidate duplicated fields such as the ResolvedTargetModel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in conflict with some of the comments on #845 where the discussion went to the direction that scheduler shouldn't rely on structs outside of the scheduling package.

yes, I agree there are duplicate fields.
we should probably converge such that those fields are kept in scheduling request only and removed from RequestContext.

if it's not a hard issue from you PoV, I suggest to defer it to a follow up PR since this hasn't change in this PR (this was the situation also before this PR).
I like to keep PRs tightly scoped (the scope of this PR is just the move of PostResponse out of scheduler).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add the scheduling pkg type as a parameter here so that we don't duplicate the parameters.

package requestcontrol

// Response contains information from the response received to be passed to PostResponse plugins
type Response struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are mixising the schedulingtypes.LLMRequest and requescontrol.Response, I think we should at least be consistent.

I don't see a good reason why they need to differ. I think we can just consolidate them to a shared Request/Response objects, perhaps at the top level. If for any reason we need scheduler/director specific metadata they can be added to scheduler/director specific structs.

Copy link
Contributor

@liu-cong liu-cong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

package requestcontrol

// Response contains information from the response received to be passed to PostResponse plugins
type Response struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me now multiple packages need to access some common request/response structs, and it makes sense to create a common types package containing request/response. In the case of a specific package needing to extend from the common types, e.g., in scheduling, you can do

package scheduling

type Request struct {
    commontypes.Request
    additionalFields
}

But I am OK with deferring this.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 3, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Jun 4, 2025

Since PostResponse will likely be implemented by the scheduling plugins (like prefix aware scheduling), I think we should continue to have its definition and execution handled by the scheduling pkg.

RunPostResponsePlugins can be invoked as follows: have the Schedule function return a callback that the director invokes when we get the response. This callback is the OnResponse function call.

PostResponse is part of the scheduling pkg if we think about it as a callback for when the request it scheduled successfully executes.

log.FromContext(ctx).V(logutil.DEBUG).Info("Running post-response plugin", "plugin", plugin.Name())
before := time.Now()
plugin.PostResponse(ctx, request, response, targetPod)
metrics.RecordRequestControlPluginProcessingLatency(PostResponsePluginType, plugin.Name(), time.Since(before))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric is tracking post response only, why is it called request_control_plugin_duration_seconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR moves PostResponse plugin from Scheduler to Requestcontrol layer. it is the first plugin in this layer out of the ones that appear in the northstar doc.

more plugins in request control are expected to be added.

// WORKAROUND until PostResponse is out of Scheduler
profileExecutionResults := map[string]*types.Result{}
profiles := s.profilePicker.Pick(nil, s.profiles, profileExecutionResults) // all profiles
for _, profile := range profiles {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't be running profiles on PostResponse, the profiles are defined for the plugins that get to run in the schedule call only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code was removed (and as comment states was a workaround until we get PostResponse out of scheduler)

@nirrozenbaum
Copy link
Contributor Author

Since PostResponse will likely be implemented by the scheduling plugins (like prefix aware scheduling), I think we should continue to have its definition and execution handled by the scheduling pkg.

RunPostResponsePlugins can be invoked as follows: have the Schedule function return a callback that the director invokes when we get the response. This callback is the OnResponse function call.

PostResponse is part of the scheduling pkg if we think about it as a callback for when the request it scheduled successfully executes.

@ahg-g hard disagree on the above.
from one hand, yes, it could be that scheduler plugins will also implement PostResponse.
but handling the response is completely out of scope of the Scheduler. If I had to define Scheduler scope it would something like -
“Scheduler is responsible for selecting the best endpoint(s) for serving, given a request and the endpoints state.”

I think there are two valid approaches:

  • allow a plugin to register in both scheduler and requestcontrol layers.
  • make a design decision that in such a case, we should have two separate plugins that are sharing some struct (e.g., in prefix two plugins poining to the same indexer), and then one is scoring and the other one which is registered in requestcontrol only updates the cache upon successfull response.

@ahg-g
Copy link
Contributor

ahg-g commented Jun 4, 2025

Since PostResponse will likely be implemented by the scheduling plugins (like prefix aware scheduling), I think we should continue to have its definition and execution handled by the scheduling pkg.
RunPostResponsePlugins can be invoked as follows: have the Schedule function return a callback that the director invokes when we get the response. This callback is the OnResponse function call.
PostResponse is part of the scheduling pkg if we think about it as a callback for when the request it scheduled successfully executes.

@ahg-g hard disagree on the above. from one hand, yes, it could be that scheduler plugins will also implement PostResponse. but handling the response is completely out of scope of the Scheduler. If I had to define Scheduler scope it would something like - “Scheduler is responsible for selecting the best endpoint(s) for serving, given a request and the endpoints state.”

I am not sure how one can assert that it is completely out of scope when there is a concrete use case: prefix-aware scheduling. In this case, I am not thinking about PostResponse in the general sense of "handling response", but as a callback on the successful execution of the scheduled request to handle scheduling state maintained by the scheduling layer.

Callbacks are a well established design pattern, and in cases like this, it make sense to have the definition of the callback in the layer that expects it.

I think there are two valid approaches:

  • allow a plugin to register in both scheduler and requestcontrol layers.
  • make a design decision that in such a case, we should have two separate plugins that are sharing some struct (e.g., in prefix two plugins poining to the same indexer), and then one is scoring and the other one which is registered in requestcontrol only updates the cache upon successfull response.

Those two bullets seem to be one approach, which is what we should be doing in all cases; meaning all extension points that a plugin implements should be executable irrespective of which layer executes them.

The main thing I was trying to address is ensuring that scheduling plugins remain encapsulated within the scheduling package. The question now is this actually important to maintain?

@nirrozenbaum
Copy link
Contributor Author

nirrozenbaum commented Jun 4, 2025

I am not sure how one can assert that it is completely out of scope when there is a concrete use case: prefix-aware scheduling.

@ahg-g I think the above statement is mixing things.
the concrete use case we have is that we should access prefix aware cache in two extension points - Score and PostResponse. This use case has nothing to do with defining a callback in scheduling package to call PostResponse plugins.

In this case, I am not thinking about PostResponse in the general sense of "handling response", but as a callback on the successful execution of the scheduled request to handle scheduling state maintained by the scheduling layer.
Callbacks are a well established design pattern, and in cases like this, it make sense to have the definition of the callback in the layer that expects it.

This is an implementation detail. One option to implement is using callback. the other option is to put extension point in the requestcontrol layer. I strongly prefer PostResponse in requestcontrol layer due to the reasons I specified above - response handling has nothing to do with scheduling package. this is also aligned with the northstar document and also with the previous discussions on #845 where we wrote (more than once) that PostResponse should NOT be part of the scheduler.

The main thing I was trying to address is ensuring that scheduling plugins remain encapsulated within the scheduling package. The question now is this actually important to maintain?

I'm not sure I get the point here. as of today plugins are under pkg/scheduling. but that's an arbitrary decision.
if our design choice is that a plugin can implement any extension point from multiple layers, that means that by design the plugins should NOT reside inside scheduling package. an alternative choice would be to put the plugins under pkg/plugins and each of the concrete plugins can reference it's extension points (e.g., prefix plugin may reference Score and PostResponse extension points).

@ahg-g
Copy link
Contributor

ahg-g commented Jun 4, 2025

I am not sure how one can assert that it is completely out of scope when there is a concrete use case: prefix-aware scheduling.

@ahg-g I think the above statement is mixing things. the concrete use case we have is that we should access prefix aware cache in two extension points - Score and PostResponse. This use case has nothing to do with defining a callback in scheduling package to call PostResponse plugins.

I am not looking at extension points in isolation, I am looking at it from the perspective of an end-to-end scheduling feature. So the use case has to do with scheduling state management that some scheduling features (like prefix-aware scheduling) require.

In this case, I am not thinking about PostResponse in the general sense of "handling response", but as a callback on the successful execution of the scheduled request to handle scheduling state maintained by the scheduling layer.
Callbacks are a well established design pattern, and in cases like this, it make sense to have the definition of the callback in the layer that expects it.

This is an implementation detail. One option to implement is using callback. the other option is to put extension point in the requestcontrol layer. I strongly prefer PostResponse in requestcontrol layer due to the reasons I specified above - response handling has nothing to do with scheduling package. this is also aligned with the northstar document and also with the previous discussions on #845 where we wrote (more than once) that PostResponse should NOT be part of the scheduler.

It makes sense to have PostResponse in the request control pkg, I am not disagreeing with that, but what I am trying to propose is that it also make sense to think of that extension point as a callback on a successful scheduling decision made by the scheduling layer. My thinking is that the latter interpretation of this extension point may be more relevant to us if most plugins that implement this extension point are primarily scheduling plugins.

Another factor driving this other interpretation in my mind is to ensure the scheduling pkg and its plugins continue to be self contained to the extent that some other project could use it as is. I understand that this is not a priority shared by others though.

The main thing I was trying to address is ensuring that scheduling plugins remain encapsulated within the scheduling package. The question now is this actually important to maintain?

I'm not sure I get the point here. as of today plugins are under pkg/scheduling. but that's an arbitrary decision. if our design choice is that a plugin can implement any extension point from multiple layers, that means that by design the plugins should NOT reside inside scheduling package. an alternative choice would be to put the plugins under pkg/plugins and each of the concrete plugins can reference it's extension points (e.g., prefix plugin may reference Score and PostResponse extension points).

Placing the plugins in a separate pkg is indeed another option I was also thinking about, I held back from proposing it because I was hoping we could maintain the scheduling pkg as a self contained "library" as mentioned above.

I don't want to hold this PR, so I am fine with this change as long as we have a principled approach to hosting plugins that implement extension points across layers.

type Plugin interface {
// Name returns the name of the plugin.
Name() string
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new base Plugin interface, the one in pkg/epp/scheduling/framework/plugin.go should be moved to perhaps pkg/epp/common/plugin.go.

This will help future configuration efforts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this, I am now thinking that we should have a separate plugins interface "pkg".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing. converged to a single Plugin interface under plugins pkg.

@nirrozenbaum
Copy link
Contributor Author

Placing the plugins in a separate pkg is indeed another option I was also thinking about, I held back from proposing it because I was hoping we could maintain the scheduling pkg as a self contained "library" as mentioned above.

@ahg-g I think this is the best option.
not specifically to this PR but in general, plugins may implement extensions from multiple layers.

scheduling remains a self contained ”library”. having plugins implemented in a different package (e.g., in pkg/plugins) doesn’t break it. it’s kinda similar to having out of tree plugins.
the scheduling package just gets the plugins implementations from outside.

I’m also happy to converge on a single plugin interface that all layers will use, but strongly prefer to defer it to a follow up PR since I’d like to keep this one tightly scoped to moving PostSchedule outside of scheduler, while this change would include moving all plugins to a different package.

The above change will also answer @shmuelk concern, and @elevran also mentioned it in one of his review comments.

@shmuelk
Copy link
Contributor

shmuelk commented Jun 4, 2025

scheduling remains a self contained ”library”. having plugins implemented in a different package (e.g., in pkg/plugins) doesn’t break it. it’s kinda similar to having out of tree plugins.
the scheduling package just gets the plugins implementations from outside.

I was only talking about the base interface, not all of the other scheduler plugins.

@elevran
Copy link
Contributor

elevran commented Jun 4, 2025

... if most plugins that implement this extension point are primarily scheduling plugins.

Referencing other networking systems with extension points (e.g., web proxies such as nginx or httpd) - they define a set of hook points and a module can implement/register for any that they need. There is no attempt to group them into a scoped collection by "phase".
I believe that we're looking at this as a point in time, where the system is mostly/only scheduler plugins. When/If the other layers in the EPP are added, we would have much more diversity. For example "routing layer" plugins like semantic cache would be interested in PostResponse as well. A semantic cache would need to register to both request and response arrival events. Neither would be in the context of a scheduling decision, IMO.

@ahg-g
Copy link
Contributor

ahg-g commented Jun 4, 2025

There is no attempt to group them into a scoped collection by "phase".

Right, but let me iterate one point that influenced this line of thinking: our mental model was to try and have a self contained scheduling library that can be used by other projects. This was my attempt to ensure that we continue to have this optionality.

@ahg-g
Copy link
Contributor

ahg-g commented Jun 4, 2025

not specifically to this PR but in general, plugins may implement extensions from multiple layers.

I am on board with this, but we need to find a reasonable structure to ensure it is easy to navigate the plugins, I would still consider grouping them somehow (for example: scheduling, flow control and data).

Also, if we are going this direction, one thing to consider is to pull all definitions of the extensions interfaces, including the input/out types, into a common "plugins interface pkg". With this, the plugins implementation should have a single dependency, the plugins interface pkg only, wdyt? Perhaps we can have this discussion in the proposal PR.

@elevran
Copy link
Contributor

elevran commented Jun 4, 2025

understood.
That could indeed interfere with having a standalone scheduling library that has all the hooks needed self-contained.
The downside is that some future plugin lacking hooks which would push more hook points into "scheduling"...
I better understand the proposal for having a scheduler callback being run by request-control. Thanks for clarifying!

I do wonder if that line of thought won't start pulling "pre-scheduling" (e.g., routing, flow control) into the scheduling the lib as well (same arguments can be made).

@elevran
Copy link
Contributor

elevran commented Jun 4, 2025

I am on board with this, but we need to find a reasonable structure to ensure it is easy to navigate the plugins, I would still consider grouping them somehow (for example: scheduling, flow control and data).

This could be solved via directories (plugins/scheduling/..., plugins/routing/..., ...) or by using a directory per plugin (plugins/prefix, plugins/semanticcache, ...) and a README file under plugins directory describing each plugin (purpose, configuration, hook points, ...).
A hybrid is also possible: single layer plugins (scheduling, etc.) can be under the layer directory and those that span multiple layers under their own directory.

@nirrozenbaum
Copy link
Contributor Author

Also, if we are going this direction, one thing to consider is to pull all definitions of the extensions interfaces, including the input/out types, into a common "plugins interface pkg". With this, the plugins implementation should have a single dependency, the plugins interface pkg only, wdyt? Perhaps we can have this discussion in the proposal PR.

this is an optional direction indeed. as @elevran mentioned, in this point in time all plugins are scheduling related, but thinking forward that will not be the case and we need to think from now how do we organize the plugins.
having a pkg/plugins hybrid structure like @elevran sounds good to me (with a readme that explains the rational):

A hybrid is also possible: single layer plugins (scheduling, etc.) can be under the layer directory and those that span multiple layers under their own directory.

of course any structuring that we agree on is fine as long as it is documented.

I also agree with @elevran following statement:

that line of thought won't start pulling "pre-scheduling" (e.g., routing, flow control) into the scheduling the lib as well (same arguments can be made).

I think the scope of responsibility of each layer should be very well defined. and as I stated in one of the previous comments, when I'm trying to define what is the scope of the scheduler my thinking is something like - “Scheduler is responsible for selecting the best endpoint(s) for serving, given a request and the endpoints state.”

I'm happy to converge on this topic in the proposal PR (although not exactly pure scheduler but more of general pluggability structuring). sounds to me like we're all are in alignment that having a pkg/plugins dir to hold the various plugins in one structure or another is agreed by all, so I think we're very close to converge.
let's keep this discussion in the proposal PR.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 4, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Jun 4, 2025

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 4, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 4, 2025
@k8s-ci-robot k8s-ci-robot merged commit 82105a6 into kubernetes-sigs:main Jun 4, 2025
8 checks passed
@nirrozenbaum nirrozenbaum deleted the post-response branch June 4, 2025 18:00
shmuelk pushed a commit to shmuelk/gateway-api-inference-extension that referenced this pull request Jun 9, 2025
…ernetes-sigs#914)

* move PostResponse plugins to requestcontrol instead of scheduler

Signed-off-by: Nir Rozenbaum <[email protected]>

* typo

Signed-off-by: Nir Rozenbaum <[email protected]>

* fixed typo raised by elevran in code review

Signed-off-by: Nir Rozenbaum <[email protected]>

* added general Plugin interface in requestcontrol layer

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed LLMResponse from scheduler

Signed-off-by: Nir Rozenbaum <[email protected]>

* added metric for request-control plugin and fixed a copy paste typo when recording plugin time

Signed-off-by: Nir Rozenbaum <[email protected]>

* converged to a single plugins interface based on code review

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request Jun 11, 2025
…ernetes-sigs#914)

* move PostResponse plugins to requestcontrol instead of scheduler

Signed-off-by: Nir Rozenbaum <[email protected]>

* typo

Signed-off-by: Nir Rozenbaum <[email protected]>

* fixed typo raised by elevran in code review

Signed-off-by: Nir Rozenbaum <[email protected]>

* added general Plugin interface in requestcontrol layer

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed LLMResponse from scheduler

Signed-off-by: Nir Rozenbaum <[email protected]>

* added metric for request-control plugin and fixed a copy paste typo when recording plugin time

Signed-off-by: Nir Rozenbaum <[email protected]>

* converged to a single plugins interface based on code review

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants