-
Notifications
You must be signed in to change notification settings - Fork 176
Description
What would you like to be added:
EPP currently watches all Pods via a controller runtime reconciler and then filters them based on the current InferencePool.SpecSelector
.
In large clusters, this could lead to high load.
Based on discussions in #300 and #301 (and esp. these comments, it may be worthwhile to consider the use of an (EPP managed) Service that is synchronized with the InferencePool
Selector and then watch its EndpointSlices
.
This would delegate endpoint discovery to Kubernetes, which is potentially optimized.
A possible downside is the exposure of the Service to users via the API. The name itself can be randomized (e.g., "<inferencepool>-hash(selector)") so collisions are less likely.
Why is this needed:
Using an EndpointSlice
could lead to better scaling of EPP, especially on Clusters with many Pods (and only a small fraction associated with the InferencePool
).