Skip to content

[Feature] Leverage an (EPP managed) Kubernetes Service for InferencePool endpoint discovery? #1100

@elevran

Description

@elevran

What would you like to be added:
EPP currently watches all Pods via a controller runtime reconciler and then filters them based on the current InferencePool.SpecSelector.
In large clusters, this could lead to high load.
Based on discussions in #300 and #301 (and esp. these comments, it may be worthwhile to consider the use of an (EPP managed) Service that is synchronized with the InferencePool Selector and then watch its EndpointSlices.
This would delegate endpoint discovery to Kubernetes, which is potentially optimized.
A possible downside is the exposure of the Service to users via the API. The name itself can be randomized (e.g., "<inferencepool>-hash(selector)") so collisions are less likely.

Why is this needed:
Using an EndpointSlice could lead to better scaling of EPP, especially on Clusters with many Pods (and only a small fraction associated with the InferencePool).

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions