Skip to content

Commit 2272848

Browse files
SinaChavoshikfswain
authored andcommitted
chore(conformance): Add timeout configuration (#795)
* Add inferencepool_lifecycle test. * Resolve setup issues and enable InferencePool test * removed todo comment in helper.go * Add InferencePoolLifecycle test * update comments in helper.go * remove Conformanc.go from log message * Remove lifecycle test. * Removed unused helper methods ( inference pool must have selector & must be deleted) * Set timeout values as constant * change timeout.go to timing.go
1 parent 8baf74c commit 2272848

File tree

6 files changed

+116
-64
lines changed

6 files changed

+116
-64
lines changed

README.md

Lines changed: 25 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,33 @@
22
[![Go Reference](https://pkg.go.dev/badge/sigs.k8s.io/gateway-api-inference-extension.svg)](https://pkg.go.dev/sigs.k8s.io/gateway-api-inference-extension)
33
[![License](https://img.shields.io/github/license/kubernetes-sigs/gateway-api-inference-extension)](/LICENSE)
44

5-
# Gateway API Inference Extension (GIE)
5+
# Gateway API Inference Extension
66

7-
This project offers tools for AI Inference, enabling developers to build [Inference Gateways].
7+
Gateway API Inference Extension optimizes self-hosting Generative Models on Kubernetes.
8+
This is achieved by leveraging Envoy's [External Processing] (ext-proc) to extend any gateway that supports both ext-proc and [Gateway API] into an **[inference gateway]**.
89

9-
[Inference Gateways]:#concepts-and-definitions
10+
11+
[Inference Gateway]:#concepts-and-definitions
1012

1113
## Concepts and Definitions
1214

13-
The following are some key industry terms that are important to understand for
15+
The following specific terms to this project:
16+
17+
- **Inference Gateway (IGW)**: A proxy/load-balancer which has been coupled with an
18+
`Endpoint Picker`. It provides optimized routing and load balancing for
19+
serving Kubernetes self-hosted generative Artificial Intelligence (AI)
20+
workloads. It simplifies the deployment, management, and observability of AI
21+
inference workloads.
22+
- **Inference Scheduler**: An extendable component that makes decisions about which endpoint is optimal (best cost /
23+
best performance) for an inference request based on `Metrics and Capabilities`
24+
from [Model Serving](/docs/proposals/003-model-server-protocol/README.md).
25+
- **Metrics and Capabilities**: Data provided by model serving platforms about
26+
performance, availability and capabilities to optimize routing. Includes
27+
things like [Prefix Cache] status or [LoRA Adapters] availability.
28+
- **Endpoint Picker(EPP)**: An implementation of an `Inference Scheduler` with additional Routing, Flow, and Request Control layers to allow for sophisticated routing strategies. Additional info on the architecture of the EPP [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/0683-epp-architecture-proposal).
29+
30+
31+
The following are key industry terms that are important to understand for
1432
this project:
1533

1634
- **Model**: A generative AI model that has learned patterns from data and is
@@ -26,22 +44,6 @@ this project:
2644
(GPUs) that can be attached to Kubernetes nodes to speed up computations,
2745
particularly for training and inference tasks.
2846

29-
And the following are more specific terms to this project:
30-
31-
- **Scheduler**: Makes decisions about which endpoint is optimal (best cost /
32-
best performance) for an inference request based on `Metrics and Capabilities`
33-
from [Model Serving](/docs/proposals/003-model-server-protocol/README.md).
34-
- **Metrics and Capabilities**: Data provided by model serving platforms about
35-
performance, availability and capabilities to optimize routing. Includes
36-
things like [Prefix Cache] status or [LoRA Adapters] availability.
37-
- **Endpoint Selector**: A `Scheduler` combined with `Metrics and Capabilities`
38-
systems is often referred to together as an [Endpoint Selection Extension]
39-
(this is also sometimes referred to as an "endpoint picker", or "EPP").
40-
- **Inference Gateway**: A proxy/load-balancer which has been coupled with a
41-
`Endpoint Selector`. It provides optimized routing and load balancing for
42-
serving Kubernetes self-hosted generative Artificial Intelligence (AI)
43-
workloads. It simplifies the deployment, management, and observability of AI
44-
inference workloads.
4547

4648
For deeper insights and more advanced concepts, refer to our [proposals](/docs/proposals).
4749

@@ -50,12 +52,13 @@ For deeper insights and more advanced concepts, refer to our [proposals](/docs/p
5052
[Prefix Cache]:https://docs.vllm.ai/en/stable/design/v1/prefix_caching.html
5153
[LoRA Adapters]:https://docs.vllm.ai/en/stable/features/lora.html
5254
[Endpoint Selection Extension]:https://gateway-api-inference-extension.sigs.k8s.io/#endpoint-selection-extension
55+
[External Processing]:https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter
5356

5457
## Technical Overview
5558

56-
This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter)-capable proxy or gateway - such as Envoy Gateway, kGateway, or the GKE Gateway - to become an **inference gateway** - supporting inference platform teams self-hosting large language models on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee.
59+
This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) capable proxy or gateway - such as Envoy Gateway, kGateway, or the GKE Gateway - to become an **[inference gateway]** - supporting inference platform teams self-hosting Generative Models (with a current focus on large language models) on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee.
5760

58-
The inference gateway:
61+
The Inference Gateway:
5962

6063
* Improves the tail latency and throughput of LLM completion requests against Kubernetes-hosted model servers using an extensible request scheduling alogrithm that is kv-cache and request cost aware, avoiding evictions or queueing as load increases
6164
* Provides [Kubernetes-native declarative APIs](https://gateway-api-inference-extension.sigs.k8s.io/concepts/api-overview/) to route client model names to use-case specific LoRA adapters and control incremental rollout of new adapter versions, A/B traffic splitting, and safe blue-green base model and model server upgrades

conformance/conformance.go

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ import (
2525
"io/fs"
2626
"os"
2727
"testing"
28-
"time"
2928

3029
"github.com/stretchr/testify/require"
3130
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
@@ -64,6 +63,7 @@ import (
6463

6564
// Import the Inference Extension API types
6665
inferencev1alpha2 "sigs.k8s.io/gateway-api-inference-extension/api/v1alpha2"
66+
inferenceconfig "sigs.k8s.io/gateway-api-inference-extension/conformance/utils/config"
6767
)
6868

6969
// Constants for the shared Gateway
@@ -245,16 +245,16 @@ func ensureGatewayAvailableAndReady(t *testing.T, k8sClient client.Client, opts
245245
t.Logf("Attempting to fetch Gateway %s/%s.", gatewayNN.Namespace, gatewayNN.Name)
246246
gw := &gatewayv1.Gateway{} // This gw instance will be populated by the poll function
247247

248-
// Define polling interval
249-
// TODO: Make this configurable using a local TimeoutConfig (from ConformanceOptions perhaps)
250-
pollingInterval := 5 * time.Second
251-
// Use the GatewayMustHaveAddress timeout from the suite's TimeoutConfig for the Gateway object to appear
252-
waitForGatewayCreationTimeout := opts.TimeoutConfig.GatewayMustHaveAddress
248+
// Use extension-specific config for the polling interval defined in timeout.go.
249+
extTimeoutConf := inferenceconfig.DefaultInferenceExtensionTimeoutConfig()
250+
251+
// Use the GatewayMustHaveAddress timeout from the suite's base TimeoutConfig for the Gateway object to appear.
252+
waitForGatewayCreationTimeout := extTimeoutConf.TimeoutConfig.GatewayMustHaveAddress
253253

254254
logDebugf(t, opts.Debug, "Waiting up to %v for Gateway object %s/%s to appear after manifest application...", waitForGatewayCreationTimeout, gatewayNN.Namespace, gatewayNN.Name)
255255

256256
ctx := context.TODO()
257-
pollErr := wait.PollUntilContextTimeout(ctx, pollingInterval, waitForGatewayCreationTimeout, true, func(pollCtx context.Context) (bool, error) {
257+
pollErr := wait.PollUntilContextTimeout(ctx, extTimeoutConf.GatewayObjectPollInterval, waitForGatewayCreationTimeout, true, func(pollCtx context.Context) (bool, error) {
258258
fetchErr := k8sClient.Get(pollCtx, gatewayNN, gw)
259259
if fetchErr == nil {
260260
t.Logf("Successfully fetched Gateway %s/%s. Spec.GatewayClassName: %s",

conformance/tests/basic/inferencepool_accepted.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ var InferencePoolAccepted = suite.ConformanceTest{
5454
Status: metav1.ConditionTrue,
5555
Reason: "", // "" means we don't strictly check the Reason for this basic test.
5656
}
57-
infrakubernetes.InferencePoolMustHaveCondition(t, s.Client, s.TimeoutConfig, poolNN, acceptedCondition)
57+
infrakubernetes.InferencePoolMustHaveCondition(t, s.Client, poolNN, acceptedCondition)
5858
})
5959
},
6060
}

conformance/utils/config/timing.go

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
/*
2+
Copyright 2025 The Kubernetes Authors.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package config
18+
19+
import (
20+
"time"
21+
22+
// Import the upstream Gateway API timeout config
23+
gatewayconfig "sigs.k8s.io/gateway-api/conformance/utils/config"
24+
)
25+
26+
// InferenceExtensionTimeoutConfig embeds the upstream TimeoutConfig and adds
27+
// extension-specific timeout values.
28+
type InferenceExtensionTimeoutConfig struct {
29+
// All fields from gatewayconfig.TimeoutConfig will be available directly.
30+
gatewayconfig.TimeoutConfig
31+
32+
// InferencePoolMustHaveConditionTimeout represents the maximum time to wait for an InferencePool to have a specific condition.
33+
InferencePoolMustHaveConditionTimeout time.Duration
34+
35+
// InferencePoolMustHaveConditionInterval represents the polling interval for checking an InferencePool's condition.
36+
InferencePoolMustHaveConditionInterval time.Duration
37+
38+
// GatewayObjectPollInterval is the polling interval used when waiting for a Gateway object to appear.
39+
GatewayObjectPollInterval time.Duration
40+
}
41+
42+
func DefaultInferenceExtensionTimeoutConfig() InferenceExtensionTimeoutConfig {
43+
return InferenceExtensionTimeoutConfig{
44+
TimeoutConfig: gatewayconfig.DefaultTimeoutConfig(),
45+
InferencePoolMustHaveConditionTimeout: 300 * time.Second,
46+
InferencePoolMustHaveConditionInterval: 10 * time.Second,
47+
GatewayObjectPollInterval: 5 * time.Second,
48+
}
49+
}

conformance/utils/kubernetes/helpers.go

Lines changed: 32 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ import (
2323
"fmt"
2424
"reflect"
2525
"testing"
26-
"time"
2726

2827
"github.com/stretchr/testify/require"
2928
apierrors "k8s.io/apimachinery/pkg/api/errors"
@@ -36,7 +35,7 @@ import (
3635
inferenceapi "sigs.k8s.io/gateway-api-inference-extension/api/v1alpha2" // Adjust if your API version is different
3736

3837
// Import necessary utilities from the core Gateway API conformance suite
39-
"sigs.k8s.io/gateway-api/conformance/utils/config"
38+
"sigs.k8s.io/gateway-api-inference-extension/conformance/utils/config"
4039
)
4140

4241
// checkCondition is a helper function similar to findConditionInList or CheckCondition
@@ -67,45 +66,48 @@ func checkCondition(t *testing.T, conditions []metav1.Condition, expectedConditi
6766
// InferencePoolMustHaveCondition waits for the specified InferencePool resource
6867
// to exist and report the expected status condition within one of its parent statuses.
6968
// It polls the InferencePool's status until the condition is met or the timeout occurs.
70-
func InferencePoolMustHaveCondition(t *testing.T, c client.Client, timeoutConfig config.TimeoutConfig, poolNN types.NamespacedName, expectedCondition metav1.Condition) {
69+
func InferencePoolMustHaveCondition(t *testing.T, c client.Client, poolNN types.NamespacedName, expectedCondition metav1.Condition) {
7170
t.Helper() // Marks this function as a test helper
7271

72+
var timeoutConfig config.InferenceExtensionTimeoutConfig = config.DefaultInferenceExtensionTimeoutConfig()
7373
var lastObservedPool *inferenceapi.InferencePool
7474
var lastError error
7575
var conditionFound bool
76-
var interval time.Duration = 5 * time.Second // pull interval for status checks.
77-
78-
// TODO: Make retry interval configurable.
79-
waitErr := wait.PollUntilContextTimeout(context.Background(), interval, timeoutConfig.DefaultTestTimeout, true, func(ctx context.Context) (bool, error) {
80-
pool := &inferenceapi.InferencePool{} // This is the type instance used for Get
81-
err := c.Get(ctx, poolNN, pool)
82-
if err != nil {
83-
if apierrors.IsNotFound(err) {
84-
t.Logf("InferencePool %s not found yet. Retrying.", poolNN.String())
76+
77+
waitErr := wait.PollUntilContextTimeout(
78+
context.Background(),
79+
timeoutConfig.InferencePoolMustHaveConditionInterval,
80+
timeoutConfig.InferencePoolMustHaveConditionTimeout,
81+
true, func(ctx context.Context) (bool, error) {
82+
pool := &inferenceapi.InferencePool{} // This is the type instance used for Get
83+
err := c.Get(ctx, poolNN, pool)
84+
if err != nil {
85+
if apierrors.IsNotFound(err) {
86+
t.Logf("InferencePool %s not found yet. Retrying.", poolNN.String())
87+
lastError = err
88+
return false, nil
89+
}
90+
t.Logf("Error fetching InferencePool %s (type: %s): %v. Retrying.", poolNN.String(), reflect.TypeOf(pool).String(), err)
8591
lastError = err
8692
return false, nil
8793
}
88-
t.Logf("Error fetching InferencePool %s (type: %s): %v. Retrying.", poolNN.String(), reflect.TypeOf(pool).String(), err)
89-
lastError = err
90-
return false, nil
91-
}
92-
lastObservedPool = pool
93-
lastError = nil
94-
conditionFound = false
94+
lastObservedPool = pool
95+
lastError = nil
96+
conditionFound = false
9597

96-
if len(pool.Status.Parents) == 0 {
97-
t.Logf("InferencePool %s has no parent statuses reported yet.", poolNN.String())
98-
return false, nil
99-
}
98+
if len(pool.Status.Parents) == 0 {
99+
t.Logf("InferencePool %s has no parent statuses reported yet.", poolNN.String())
100+
return false, nil
101+
}
100102

101-
for _, parentStatus := range pool.Status.Parents {
102-
if checkCondition(t, parentStatus.Conditions, expectedCondition) {
103-
conditionFound = true
104-
return true, nil
103+
for _, parentStatus := range pool.Status.Parents {
104+
if checkCondition(t, parentStatus.Conditions, expectedCondition) {
105+
conditionFound = true
106+
return true, nil
107+
}
105108
}
106-
}
107-
return false, nil
108-
})
109+
return false, nil
110+
})
109111

110112
if waitErr != nil || !conditionFound {
111113
debugMsg := ""

site-src/index.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,11 +44,9 @@ implementations](https://gateway-api.sigs.k8s.io/implementations/). As this
4444
pattern stabilizes, we expect a wide set of these implementations to support
4545
this project.
4646

47-
### Endpoint Selection Extension
47+
### Endpoint Picker
4848

49-
As part of this project, we're building an initial reference extension. Over
50-
time, we hope to see a wide variety of extensions emerge that follow this
51-
pattern and provide a wide range of choices.
49+
As part of this project, we've built the Endpoing Picker. A pluggable & extensible ext-proc deployment that implements [this architecture](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/0683-epp-architecture-proposal).
5250

5351
### Model Server Frameworks
5452

0 commit comments

Comments
 (0)