[Feature] Leverage an (EPP managed) Kubernetes Service for InferencePool endpoint discovery?

**What would you like to be added**:
EPP currently watches all Pods via a controller runtime reconciler and then filters them based on the current `InferencePool.SpecSelector`.
In large clusters, this could lead to high load.
Based on discussions in #300 and #301 (and esp. [these comments](https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/300#discussion_r1954768129), it may be worthwhile to consider the use of an (EPP managed) Service that is synchronized with the `InferencePool` Selector and then watch its `EndpointSlices`.
This would delegate endpoint discovery to Kubernetes, which is potentially optimized.
A possible downside is the exposure of the Service to users via the API. The name itself can be randomized (e.g., "\<inferencepool\>-hash(selector)") so collisions are less likely.

**Why is this needed**:
Using an `EndpointSlice` could lead to better scaling of EPP, especially on Clusters with many Pods (and only a small fraction associated with the `InferencePool`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Leverage an (EPP managed) Kubernetes Service for InferencePool endpoint discovery? #1100

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Leverage an (EPP managed) Kubernetes Service for InferencePool endpoint discovery? #1100

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions