-
Notifications
You must be signed in to change notification settings - Fork 182
Description
Our current Envoy integration relies on EnvoyExtensionPolicy
and EnvoyPatchPolicy
this is very manual, and not sustainable.
(See: #18)
We're trying to settle on a single implementation that this project will work on to extend to support LLMServerPool as a Gateway API backend. This will enable us to run e2e tests against these concepts and iterate more quickly. That implementation should be:
- An existing conformant implementation of Gateway API
- Part of CNCF
- Envoy-based for simplicity of extension mechanisms
- Open to contributions from us to support this new type of backend
We propose extending existing this gateway implementation to act as the controller for the LLMServerPool
object. (See: https://github.com/kubernetes-sigs/llm-instance-gateway/blob/main/docs/proposals/002-api-proposal/proposal.md#llmserverpool). As well as updating HTTPRoute
to support a LLMServerPool
as a backendRef.
At a high level we expect this to look like:
- Upon creation of an LLMServerPool the controller creates: An ext-proc deployment/service. An original_dst cluster.
- Upon creation of an HTTPRoute with an LLMServerPool as a backendRef: A Listener that routes requests to the appropriate original_dst cluster (there may be multiple LLMServerPools), and configure ext_proc to operate on requests sent to this cluster