-
Couldn't load subscription status.
- Fork 184
move PostResponse plugins to requestcontrol instead of scheduler #914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
8cbb0e0
ce5bae2
9316b51
044df22
2bbd2b5
d9ab99b
19db87d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -32,6 +32,7 @@ import ( | |
| "sigs.k8s.io/gateway-api-inference-extension/api/v1alpha2" | ||
| "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/backend" | ||
| "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics" | ||
| schedulingtypes "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/types" | ||
| errutil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/error" | ||
| logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/logging" | ||
| requtil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/request" | ||
|
|
@@ -79,7 +80,7 @@ type StreamingServer struct { | |
| // Specifically, there are fields related to the ext-proc protocol, and then fields related to the lifecycle of the request. | ||
| // We should split these apart as this monolithic object exposes too much data to too many layers. | ||
| type RequestContext struct { | ||
| TargetPod string | ||
| TargetPod *backend.Pod | ||
| TargetEndpoint string | ||
| Model string | ||
| ResolvedTargetModel string | ||
|
|
@@ -93,6 +94,8 @@ type RequestContext struct { | |
| RequestRunning bool | ||
| Request *Request | ||
|
|
||
| SchedulingRequest *schedulingtypes.LLMRequest | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are many duplicated fields in LLMRequest and RequestContext. Initially the LLMRequest was scoped to the scheduling package only. Can we move LLMRequest out of scheduling package now it has wider scope? And consolidate duplicated fields such as the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is in conflict with some of the comments on #845 where the discussion went to the direction that scheduler shouldn't rely on structs outside of the scheduling package. yes, I agree there are duplicate fields. if it's not a hard issue from you PoV, I suggest to defer it to a follow up PR since this hasn't change in this PR (this was the situation also before this PR). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can add the scheduling pkg type as a parameter here so that we don't duplicate the parameters. |
||
|
|
||
| RequestState StreamRequestState | ||
| modelServerStreaming bool | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,13 +23,15 @@ import ( | |
| "fmt" | ||
| "math/rand" | ||
| "strconv" | ||
| "time" | ||
|
|
||
| "github.com/go-logr/logr" | ||
| "sigs.k8s.io/controller-runtime/pkg/log" | ||
| "sigs.k8s.io/gateway-api-inference-extension/api/v1alpha2" | ||
| "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/backend" | ||
| "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/datastore" | ||
| "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/handlers" | ||
| "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics" | ||
| schedulingtypes "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/types" | ||
| errutil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/error" | ||
| logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/logging" | ||
|
|
@@ -39,24 +41,32 @@ import ( | |
| // Scheduler defines the interface required by the Director for scheduling. | ||
| type Scheduler interface { | ||
| Schedule(ctx context.Context, b *schedulingtypes.LLMRequest) (result map[string]*schedulingtypes.Result, err error) | ||
| OnResponse(ctx context.Context, resp *schedulingtypes.LLMResponse, targetPodName string) | ||
| } | ||
|
|
||
| // SaturationDetector provides a signal indicating whether the backends are considered saturated. | ||
| type SaturationDetector interface { | ||
| IsSaturated(ctx context.Context) bool | ||
| } | ||
|
|
||
| // NewDirector creates a new Director instance with all dependencies. | ||
| // postResponsePlugins remains nil as this is an optional field that can be set using the "WithPostResponsePlugins" function. | ||
| func NewDirector(datastore datastore.Datastore, scheduler Scheduler, saturationDetector SaturationDetector) *Director { | ||
| return &Director{datastore: datastore, scheduler: scheduler, saturationDetector: saturationDetector} | ||
| } | ||
|
|
||
| // Director orchestrates the request handling flow, including scheduling. | ||
| type Director struct { | ||
| datastore datastore.Datastore | ||
| scheduler Scheduler | ||
| saturationDetector SaturationDetector | ||
| datastore datastore.Datastore | ||
| scheduler Scheduler | ||
| saturationDetector SaturationDetector | ||
| postResponsePlugins []PostResponsePlugin | ||
| } | ||
|
|
||
| // NewDirector creates a new Director instance with all dependencies. | ||
| func NewDirector(datastore datastore.Datastore, scheduler Scheduler, saturationDetector SaturationDetector) *Director { | ||
| return &Director{datastore, scheduler, saturationDetector} | ||
| // WithPostResponsePlugins sets the given plugins as the PostResponse plugins. | ||
| // If the Director has PostResponse plugins already, this call replaces the existing plugins with the given ones. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding to the list might be more appropriate? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I intentionally left it out of the list, since this is an optional field. director := requestcontrol.NewDirector(datastore, scheduler, detector, []PostResponsePlugin{})OR director := requestcontrol.NewDirector(datastore, scheduler, detector, nil)on the other hand, since this field is optional, it is possible to initialize detector with or without it like this - director := requestcontrol.NewDirector(datastore, scheduler, detector)with: director := requestcontrol.NewDirector(datastore, scheduler, detector).
WithPostResponsePlugins(plugin1, plugin2, ...)the latter gets also the same feeling of the Scheduler plugins. |
||
| func (d *Director) WithPostResponsePlugins(plugins ...PostResponsePlugin) *Director { | ||
| d.postResponsePlugins = plugins | ||
| return d | ||
| } | ||
|
|
||
| // HandleRequest orchestrates the request lifecycle: | ||
|
|
@@ -104,7 +114,7 @@ func (d *Director) HandleRequest(ctx context.Context, reqCtx *handlers.RequestCo | |
| } | ||
|
|
||
| // Prepare LLMRequest (needed for both saturation detection and Scheduler) | ||
| llmReq := &schedulingtypes.LLMRequest{ | ||
| reqCtx.SchedulingRequest = &schedulingtypes.LLMRequest{ | ||
| TargetModel: reqCtx.ResolvedTargetModel, | ||
| RequestId: reqCtx.Request.Headers[requtil.RequestIdHeaderKey], | ||
| Critical: requestCriticality == v1alpha2.Critical, | ||
|
|
@@ -113,7 +123,7 @@ func (d *Director) HandleRequest(ctx context.Context, reqCtx *handlers.RequestCo | |
| } | ||
| logger = logger.WithValues( | ||
| "model", reqCtx.Model, | ||
| "resolvedTargetModel", llmReq.TargetModel, | ||
| "resolvedTargetModel", reqCtx.ResolvedTargetModel, | ||
| "criticality", requestCriticality, | ||
| ) | ||
| ctx = log.IntoContext(ctx, logger) | ||
|
|
@@ -126,7 +136,7 @@ func (d *Director) HandleRequest(ctx context.Context, reqCtx *handlers.RequestCo | |
| } | ||
|
|
||
| // --- 3. Dispatch (Calls Scheduler) --- | ||
| results, dispatchErr := d.Dispatch(ctx, llmReq) | ||
| results, dispatchErr := d.Dispatch(ctx, reqCtx.SchedulingRequest) | ||
| if dispatchErr != nil { | ||
| return reqCtx, dispatchErr | ||
| } | ||
|
|
@@ -193,22 +203,19 @@ func (d *Director) PostDispatch(ctx context.Context, reqCtx *handlers.RequestCon | |
| endpoint := targetPod.Address + ":" + strconv.Itoa(int(pool.Spec.TargetPortNumber)) | ||
| logger.V(logutil.DEFAULT).Info("Request handled", "model", reqCtx.Model, "targetModel", reqCtx.ResolvedTargetModel, "endpoint", targetPod) | ||
|
|
||
| reqCtx.TargetPod = targetPod.NamespacedName.String() | ||
| reqCtx.TargetPod = targetPod | ||
| reqCtx.TargetEndpoint = endpoint | ||
|
|
||
| return reqCtx, nil | ||
| } | ||
|
|
||
| func (d *Director) HandleResponse(ctx context.Context, reqCtx *handlers.RequestContext) (*handlers.RequestContext, error) { | ||
| logger := log.FromContext(ctx) | ||
|
|
||
| llmResp := &schedulingtypes.LLMResponse{ | ||
| response := &Response{ | ||
| RequestId: reqCtx.Request.Headers[requtil.RequestIdHeaderKey], | ||
| Headers: reqCtx.Response.Headers, | ||
| } | ||
| logger.V(logutil.DEBUG).Info("LLM response assembled", "response", llmResp) | ||
|
|
||
| d.scheduler.OnResponse(ctx, llmResp, reqCtx.TargetPod) | ||
| d.runPostResponsePlugins(ctx, reqCtx.SchedulingRequest, response, reqCtx.TargetPod) | ||
|
|
||
| return reqCtx, nil | ||
| } | ||
|
|
@@ -253,3 +260,12 @@ func RandomWeightedDraw(logger logr.Logger, model *v1alpha2.InferenceModel, seed | |
| } | ||
| return "" | ||
| } | ||
|
|
||
| func (d *Director) runPostResponsePlugins(ctx context.Context, request *schedulingtypes.LLMRequest, response *Response, targetPod *backend.Pod) { | ||
| for _, plugin := range d.postResponsePlugins { | ||
| log.FromContext(ctx).V(logutil.DEBUG).Info("Running post-response plugin", "plugin", plugin.Name()) | ||
| before := time.Now() | ||
| plugin.PostResponse(ctx, request, response, targetPod) | ||
| metrics.RecordRequestControlPluginProcessingLatency(PostResponsePluginType, plugin.Name(), time.Since(before)) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This metric is tracking post response only, why is it called request_control_plugin_duration_seconds? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This PR moves PostResponse plugin from Scheduler to Requestcontrol layer. it is the first plugin in this layer out of the ones that appear in the northstar doc. more plugins in request control are expected to be added. |
||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I prefer adding WithPostResponsePlugins as an "Option" object and add that as an optinal argument to the NewDirector(). This is more discoverable and remove the need of this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't really need this comment :)
this comment was added just until we add the WithPostResponse usage in
main.go.I implemented it this way to keep it consistent with how scheduler plugins are defined.
In general, both patterns are commonly used in go and more specifically in Kubernetes, but personally I prefer using the
With...approach which reads clearer to me and also allows adding it only when used.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The option pattern is optional as well, you only use when you need it. But it's more discoverable in the function signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. I was not trying to say otherwise :).
was just making the point that both are common patterns that are widely used in the community, and personally I prefer the
With..approach, which is also aligned with what was done in Scheduler. so it keeps the plugins setup consistent across the layers.