chore(conformance): Add timeout configuration (#795)

SinaChavoshi · kfswain · commit 227284814458 · 2025-05-14T19:07:21.000Z
* Add inferencepool_lifecycle test.

* Resolve setup issues and enable InferencePool test

* removed todo comment in helper.go

* Add InferencePoolLifecycle test

* update comments in helper.go

* remove Conformanc.go from log message

* Remove lifecycle test.

* Removed unused helper methods ( inference pool must have selector &amp; must be deleted)

* Set timeout values as constant

* change timeout.go to timing.go
diff --git a/README.md b/README.md
@@ -2,15 +2,33 @@
 [![Go Reference](https://pkg.go.dev/badge/sigs.k8s.io/gateway-api-inference-extension.svg)](https://pkg.go.dev/sigs.k8s.io/gateway-api-inference-extension)
 [![License](https://img.shields.io/github/license/kubernetes-sigs/gateway-api-inference-extension)](/LICENSE)
 
-# Gateway API Inference Extension (GIE)
+# Gateway API Inference Extension
 
-This project offers tools for AI Inference, enabling developers to build [Inference Gateways].
+Gateway API Inference Extension optimizes self-hosting Generative Models on Kubernetes.
+This is achieved by leveraging Envoy's [External Processing] (ext-proc) to extend any gateway that supports both ext-proc and [Gateway API] into an **[inference gateway]**. 
 
-[Inference Gateways]:#concepts-and-definitions
+
+[Inference Gateway]:#concepts-and-definitions
 
 ## Concepts and Definitions
 
-The following are some key industry terms that are important to understand for
+The following specific terms to this project:
+
+- **Inference Gateway (IGW)**: A proxy/load-balancer which has been coupled with an
+  `Endpoint Picker`. It provides optimized routing and load balancing for
+  serving Kubernetes self-hosted generative Artificial Intelligence (AI)
+  workloads. It simplifies the deployment, management, and observability of AI
+  inference workloads.
+- **Inference Scheduler**: An extendable component that makes decisions about which endpoint is optimal (best cost /
+  best performance) for an inference request based on `Metrics and Capabilities`
+  from [Model Serving](/docs/proposals/003-model-server-protocol/README.md).
+- **Metrics and Capabilities**: Data provided by model serving platforms about
+  performance, availability and capabilities to optimize routing. Includes
+  things like [Prefix Cache] status or [LoRA Adapters] availability.
+- **Endpoint Picker(EPP)**: An implementation of an `Inference Scheduler` with additional Routing, Flow, and Request Control layers to allow for sophisticated routing strategies. Additional info on the architecture of the EPP [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/0683-epp-architecture-proposal).
+  
+
+The following are key industry terms that are important to understand for
 this project:
 
 - **Model**: A generative AI model that has learned patterns from data and is
@@ -26,22 +44,6 @@ this project:
   (GPUs) that can be attached to Kubernetes nodes to speed up computations,
   particularly for training and inference tasks.
 
-And the following are more specific terms to this project:
-
-- **Scheduler**: Makes decisions about which endpoint is optimal (best cost /
-  best performance) for an inference request based on `Metrics and Capabilities`
-  from [Model Serving](/docs/proposals/003-model-server-protocol/README.md).
-- **Metrics and Capabilities**: Data provided by model serving platforms about
-  performance, availability and capabilities to optimize routing. Includes
-  things like [Prefix Cache] status or [LoRA Adapters] availability.
-- **Endpoint Selector**: A `Scheduler` combined with `Metrics and Capabilities`
-  systems is often referred to together as an [Endpoint Selection Extension]
-  (this is also sometimes referred to as an "endpoint picker", or "EPP").
-- **Inference Gateway**: A proxy/load-balancer which has been coupled with a
-  `Endpoint Selector`. It provides optimized routing and load balancing for
-  serving Kubernetes self-hosted generative Artificial Intelligence (AI)
-  workloads. It simplifies the deployment, management, and observability of AI
-  inference workloads.
 
 For deeper insights and more advanced concepts, refer to our [proposals](/docs/proposals).
 
@@ -50,12 +52,13 @@ For deeper insights and more advanced concepts, refer to our [proposals](/docs/p
 [Prefix Cache]:https://docs.vllm.ai/en/stable/design/v1/prefix_caching.html
 [LoRA Adapters]:https://docs.vllm.ai/en/stable/features/lora.html
 [Endpoint Selection Extension]:https://gateway-api-inference-extension.sigs.k8s.io/#endpoint-selection-extension
+[External Processing]:https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter
 
 ## Technical Overview
 
-This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter)-capable proxy or gateway - such as Envoy Gateway, kGateway, or the GKE Gateway - to become an **inference gateway** - supporting inference platform teams self-hosting large language models on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee.
+This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) capable proxy or gateway - such as Envoy Gateway, kGateway, or the GKE Gateway - to become an **[inference gateway]** - supporting inference platform teams self-hosting Generative Models (with a current focus on large language models) on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee.
 
-The inference gateway:
+The Inference Gateway:
 
 * Improves the tail latency and throughput of LLM completion requests against Kubernetes-hosted model servers using an extensible request scheduling alogrithm that is kv-cache and request cost aware, avoiding evictions or queueing as load increases
 * Provides [Kubernetes-native declarative APIs](https://gateway-api-inference-extension.sigs.k8s.io/concepts/api-overview/) to route client model names to use-case specific LoRA adapters and control incremental rollout of new adapter versions, A/B traffic splitting, and safe blue-green base model and model server upgrades
diff --git a/conformance/conformance.go b/conformance/conformance.go
@@ -25,7 +25,6 @@ import (
 	"io/fs"
 	"os"
 	"testing"
-	"time"
 
 	"github.com/stretchr/testify/require"
 	apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
@@ -64,6 +63,7 @@ import (
 
 	// Import the Inference Extension API types
 	inferencev1alpha2 "sigs.k8s.io/gateway-api-inference-extension/api/v1alpha2"
+	inferenceconfig "sigs.k8s.io/gateway-api-inference-extension/conformance/utils/config"
 )
 
 // Constants for the shared Gateway
@@ -245,16 +245,16 @@ func ensureGatewayAvailableAndReady(t *testing.T, k8sClient client.Client, opts
 	t.Logf("Attempting to fetch Gateway %s/%s.", gatewayNN.Namespace, gatewayNN.Name)
 	gw := &gatewayv1.Gateway{} // This gw instance will be populated by the poll function
 
-	// Define polling interval
-	// TODO: Make this configurable using a local TimeoutConfig (from ConformanceOptions perhaps)
-	pollingInterval := 5 * time.Second
-	// Use the GatewayMustHaveAddress timeout from the suite's TimeoutConfig for the Gateway object to appear
-	waitForGatewayCreationTimeout := opts.TimeoutConfig.GatewayMustHaveAddress
+	// Use extension-specific config for the polling interval defined in timeout.go.
+	extTimeoutConf := inferenceconfig.DefaultInferenceExtensionTimeoutConfig()
+
+	// Use the GatewayMustHaveAddress timeout from the suite's base TimeoutConfig for the Gateway object to appear.
+	waitForGatewayCreationTimeout := extTimeoutConf.TimeoutConfig.GatewayMustHaveAddress
 
 	logDebugf(t, opts.Debug, "Waiting up to %v for Gateway object %s/%s to appear after manifest application...", waitForGatewayCreationTimeout, gatewayNN.Namespace, gatewayNN.Name)
 
 	ctx := context.TODO()
-	pollErr := wait.PollUntilContextTimeout(ctx, pollingInterval, waitForGatewayCreationTimeout, true, func(pollCtx context.Context) (bool, error) {
+	pollErr := wait.PollUntilContextTimeout(ctx, extTimeoutConf.GatewayObjectPollInterval, waitForGatewayCreationTimeout, true, func(pollCtx context.Context) (bool, error) {
 		fetchErr := k8sClient.Get(pollCtx, gatewayNN, gw)
 		if fetchErr == nil {
 			t.Logf("Successfully fetched Gateway %s/%s. Spec.GatewayClassName: %s",
diff --git a/conformance/tests/basic/inferencepool_accepted.go b/conformance/tests/basic/inferencepool_accepted.go
@@ -54,7 +54,7 @@ var InferencePoolAccepted = suite.ConformanceTest{
 				Status: metav1.ConditionTrue,
 				Reason: "", // "" means we don't strictly check the Reason for this basic test.
 			}
-			infrakubernetes.InferencePoolMustHaveCondition(t, s.Client, s.TimeoutConfig, poolNN, acceptedCondition)
+			infrakubernetes.InferencePoolMustHaveCondition(t, s.Client, poolNN, acceptedCondition)
 		})
 	},
 }
diff --git a/conformance/utils/config/timing.go b/conformance/utils/config/timing.go
@@ -0,0 +1,49 @@
+/*
+Copyright 2025 The Kubernetes Authors.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+*/
+
+package config
+
+import (
+	"time"
+
+	// Import the upstream Gateway API timeout config
+	gatewayconfig "sigs.k8s.io/gateway-api/conformance/utils/config"
+)
+
+// InferenceExtensionTimeoutConfig embeds the upstream TimeoutConfig and adds
+// extension-specific timeout values.
+type InferenceExtensionTimeoutConfig struct {
+	// All fields from gatewayconfig.TimeoutConfig will be available directly.
+	gatewayconfig.TimeoutConfig
+
+	// InferencePoolMustHaveConditionTimeout represents the maximum time to wait for an InferencePool to have a specific condition.
+	InferencePoolMustHaveConditionTimeout time.Duration
+
+	// InferencePoolMustHaveConditionInterval represents the polling interval for checking an InferencePool's condition.
+	InferencePoolMustHaveConditionInterval time.Duration
+
+	// GatewayObjectPollInterval is the polling interval used when waiting for a Gateway object to appear.
+	GatewayObjectPollInterval time.Duration
+}
+
+func DefaultInferenceExtensionTimeoutConfig() InferenceExtensionTimeoutConfig {
+	return InferenceExtensionTimeoutConfig{
+		TimeoutConfig:                          gatewayconfig.DefaultTimeoutConfig(),
+		InferencePoolMustHaveConditionTimeout:  300 * time.Second,
+		InferencePoolMustHaveConditionInterval: 10 * time.Second,
+		GatewayObjectPollInterval:              5 * time.Second,
+	}
+}
diff --git a/conformance/utils/kubernetes/helpers.go b/conformance/utils/kubernetes/helpers.go
@@ -23,7 +23,6 @@ import (
 	"fmt"
 	"reflect"
 	"testing"
-	"time"
 
 	"github.com/stretchr/testify/require"
 	apierrors "k8s.io/apimachinery/pkg/api/errors"
@@ -36,7 +35,7 @@ import (
 	inferenceapi "sigs.k8s.io/gateway-api-inference-extension/api/v1alpha2" // Adjust if your API version is different
 
 	// Import necessary utilities from the core Gateway API conformance suite
-	"sigs.k8s.io/gateway-api/conformance/utils/config"
+	"sigs.k8s.io/gateway-api-inference-extension/conformance/utils/config"
 )
 
 // checkCondition is a helper function similar to findConditionInList or CheckCondition
@@ -67,45 +66,48 @@ func checkCondition(t *testing.T, conditions []metav1.Condition, expectedConditi
 // InferencePoolMustHaveCondition waits for the specified InferencePool resource
 // to exist and report the expected status condition within one of its parent statuses.
 // It polls the InferencePool's status until the condition is met or the timeout occurs.
-func InferencePoolMustHaveCondition(t *testing.T, c client.Client, timeoutConfig config.TimeoutConfig, poolNN types.NamespacedName, expectedCondition metav1.Condition) {
+func InferencePoolMustHaveCondition(t *testing.T, c client.Client, poolNN types.NamespacedName, expectedCondition metav1.Condition) {
 	t.Helper() // Marks this function as a test helper
 
+	var timeoutConfig config.InferenceExtensionTimeoutConfig = config.DefaultInferenceExtensionTimeoutConfig()
 	var lastObservedPool *inferenceapi.InferencePool
 	var lastError error
 	var conditionFound bool
-	var interval time.Duration = 5 * time.Second // pull interval for status checks.
-
-	// TODO: Make retry interval configurable.
-	waitErr := wait.PollUntilContextTimeout(context.Background(), interval, timeoutConfig.DefaultTestTimeout, true, func(ctx context.Context) (bool, error) {
-		pool := &inferenceapi.InferencePool{} // This is the type instance used for Get
-		err := c.Get(ctx, poolNN, pool)
-		if err != nil {
-			if apierrors.IsNotFound(err) {
-				t.Logf("InferencePool %s not found yet. Retrying.", poolNN.String())
+
+	waitErr := wait.PollUntilContextTimeout(
+		context.Background(),
+		timeoutConfig.InferencePoolMustHaveConditionInterval,
+		timeoutConfig.InferencePoolMustHaveConditionTimeout,
+		true, func(ctx context.Context) (bool, error) {
+			pool := &inferenceapi.InferencePool{} // This is the type instance used for Get
+			err := c.Get(ctx, poolNN, pool)
+			if err != nil {
+				if apierrors.IsNotFound(err) {
+					t.Logf("InferencePool %s not found yet. Retrying.", poolNN.String())
+					lastError = err
+					return false, nil
+				}
+				t.Logf("Error fetching InferencePool %s (type: %s): %v. Retrying.", poolNN.String(), reflect.TypeOf(pool).String(), err)
 				lastError = err
 				return false, nil
 			}
-			t.Logf("Error fetching InferencePool %s (type: %s): %v. Retrying.", poolNN.String(), reflect.TypeOf(pool).String(), err)
-			lastError = err
-			return false, nil
-		}
-		lastObservedPool = pool
-		lastError = nil
-		conditionFound = false
+			lastObservedPool = pool
+			lastError = nil
+			conditionFound = false
 
-		if len(pool.Status.Parents) == 0 {
-			t.Logf("InferencePool %s has no parent statuses reported yet.", poolNN.String())
-			return false, nil
-		}
+			if len(pool.Status.Parents) == 0 {
+				t.Logf("InferencePool %s has no parent statuses reported yet.", poolNN.String())
+				return false, nil
+			}
 
-		for _, parentStatus := range pool.Status.Parents {
-			if checkCondition(t, parentStatus.Conditions, expectedCondition) {
-				conditionFound = true
-				return true, nil
+			for _, parentStatus := range pool.Status.Parents {
+				if checkCondition(t, parentStatus.Conditions, expectedCondition) {
+					conditionFound = true
+					return true, nil
+				}
 			}
-		}
-		return false, nil
-	})
+			return false, nil
+		})
 
 	if waitErr != nil || !conditionFound {
 		debugMsg := ""
diff --git a/site-src/index.md b/site-src/index.md
@@ -44,11 +44,9 @@ implementations](https://gateway-api.sigs.k8s.io/implementations/). As this
 pattern stabilizes, we expect a wide set of these implementations to support
 this project.
 
-### Endpoint Selection Extension
+### Endpoint Picker
 
-As part of this project, we're building an initial reference extension. Over
-time, we hope to see a wide variety of extensions emerge that follow this
-pattern and provide a wide range of choices.
+As part of this project, we've built the Endpoing Picker. A pluggable & extensible ext-proc deployment that implements [this architecture](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/0683-epp-architecture-proposal).
 
 ### Model Server Frameworks
 

Original file line number	Diff line number	Diff line change
`@@ -54,7 +54,7 @@ var InferencePoolAccepted = suite.ConformanceTest{`
`54`	`54`	`Status: metav1.ConditionTrue,`
`55`	`55`	`Reason: "", // "" means we don't strictly check the Reason for this basic test.`
`56`	`56`	`}`
`57`		`- infrakubernetes.InferencePoolMustHaveCondition(t, s.Client, s.TimeoutConfig, poolNN, acceptedCondition)`
	`57`	`+ infrakubernetes.InferencePoolMustHaveCondition(t, s.Client, poolNN, acceptedCondition)`
`58`	`58`	`})`
`59`	`59`	`},`
`60`	`60`	`}`