Skip to content

Conversation

ahg-g
Copy link
Contributor

@ahg-g ahg-g commented Feb 21, 2025

No description provided.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 21, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 21, 2025
Copy link

netlify bot commented Feb 21, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit fa4a77d
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67b8ea6c7853ef00084fb04f
😎 Deploy Preview https://deploy-preview-386--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 21, 2025
@ahg-g ahg-g changed the title [WIP] Polish the epp README.md file Polish the epp README.md file Feb 21, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 21, 2025
@ahg-g ahg-g changed the title Polish the epp README.md file Add README.md file to the epp pkg Feb 21, 2025
@@ -0,0 +1,25 @@
# The EndPoint Picker (EPP)
This package provides the reference implementation for the Endpoint Picker (EPP). It implements the [extension protocol](../../docs/proposals/003-endpoint-picker-protocol), enabling a proxy or gateway to request endpoint hints from an extension. As it is implemented now, an EPP instance handles a single `InferencePool` (and so for each `InferencePool`, one must create a dedicated EPP deployment).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Consider dropping 'As it is implemented now'. My thinking is: we can just update this when/if we make an EPP multitenant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, done.


- Endpoint Selection
- The EPP determines the appropriate Pod endpoint for the load balancer (LB) to route requests.
- It selects from the pool of ready Pods designated by the assigned InferencePool.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- It selects from the pool of ready Pods designated by the assigned InferencePool.
- It selects from the pool of ready Pods designated by the assigned InferencePool's [Selector](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/7e3cd457cdcd01339b65861c8e472cf27e6b6e80/api/v1alpha1/inferencepool_types.go#L53) field.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a nit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

- It selects from the pool of ready Pods designated by the assigned InferencePool.
- Endpoint selection is contingent on the request's ModelName matching an `InferenceModel` that references the `InferencePool`.
- Requests with unmatched ModelName values trigger an error response to the proxy.
- The endpoint selection algorithm is detailed below.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: consider removing, the headers I think are prominent enough to draw attention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- Requests with unmatched ModelName values trigger an error response to the proxy.
- The endpoint selection algorithm is detailed below.
- Traffic Splitting and ModelName Rewriting
- The EPP facilitates controlled rollouts of new adapter versions by implementing traffic splitting between adapters within the same `InferencePool`, as defined by the `InferenceModel`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More nits: Linking to the InfModel's targetmodel field could be useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kfswain
Copy link
Collaborator

kfswain commented Feb 21, 2025

This all looks great! Left some comments, but these are true nits, feel free to disregard. This is already a major improvement from what we had before. Thanks!

/lgtm
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 21, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 21, 2025
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 21, 2025
Copy link
Contributor Author

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed review, addressed all comments

@@ -0,0 +1,25 @@
# The EndPoint Picker (EPP)
This package provides the reference implementation for the Endpoint Picker (EPP). It implements the [extension protocol](../../docs/proposals/003-endpoint-picker-protocol), enabling a proxy or gateway to request endpoint hints from an extension. As it is implemented now, an EPP instance handles a single `InferencePool` (and so for each `InferencePool`, one must create a dedicated EPP deployment).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, done.


- Endpoint Selection
- The EPP determines the appropriate Pod endpoint for the load balancer (LB) to route requests.
- It selects from the pool of ready Pods designated by the assigned InferencePool.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

- It selects from the pool of ready Pods designated by the assigned InferencePool.
- Endpoint selection is contingent on the request's ModelName matching an `InferenceModel` that references the `InferencePool`.
- Requests with unmatched ModelName values trigger an error response to the proxy.
- The endpoint selection algorithm is detailed below.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- Requests with unmatched ModelName values trigger an error response to the proxy.
- The endpoint selection algorithm is detailed below.
- Traffic Splitting and ModelName Rewriting
- The EPP facilitates controlled rollouts of new adapter versions by implementing traffic splitting between adapters within the same `InferencePool`, as defined by the `InferenceModel`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 21, 2025
@kfswain
Copy link
Collaborator

kfswain commented Feb 21, 2025

/lgtm

Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 21, 2025
@ahg-g
Copy link
Contributor Author

ahg-g commented Feb 21, 2025

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 21, 2025
@k8s-ci-robot k8s-ci-robot merged commit 9bd136a into kubernetes-sigs:main Feb 21, 2025
6 of 7 checks passed
kaushikmitr pushed a commit to kaushikmitr/llm-instance-gateway that referenced this pull request Feb 27, 2025
* Polish the epp README.md file

* Addressed comments
kfswain pushed a commit to kfswain/llm-instance-gateway that referenced this pull request Apr 29, 2025
* Polish the epp README.md file

* Addressed comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants