kubernetes-sigs
diff --git a/‎benchmarking/README.md‎
Lines changed: 88 additions & 0 deletions b/‎benchmarking/README.md‎
Lines changed: 88 additions & 0 deletions
diff --git a/‎benchmarking/benchmark-values.yaml‎
Lines changed: 62 additions & 0 deletions b/‎benchmarking/benchmark-values.yaml‎
Lines changed: 62 additions & 0 deletions
@@ -0,0 +1,88 @@
+## Prerequisites
+
+Before you begin, ensure you have the following:
+
+*   **Helm 3+**: [Installation Guide](https://helm.sh/docs/intro/install/)
+*   **Kubernetes Cluster**: Access to a Kubernetes cluster
+*   **Gateway Deployed**: Your inference server/gateway must be deployed and accessible within the cluster. [Getting Started Guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/#getting-started-with-gateway-api-inference-extension).
+*   **Hugging Face Token Secret**: A Hugging Face token to pull tokenizers.
+
+## Deployment
+
+To deploy the benchmarking chart:
+
+```bash
+export IP='<YOUR_IP>'
+export PORT='<YOUR_PORT>'
+export HF_TOKEN='<YOUR HUGGING_FACE_TOKEN>'
+export CHART_VERSION=v0.2.0
+helm install benchmark -f benchmark-values.yaml \
+  --set hftoken=${HF_TOKEN} \
+  --set "config.server.base_url=http://${IP}:${PORT}" \
+  oci://quay.io/inference-perf/charts/inference-perf:${CHART_VERSION}
+```
+
+**Parameters to customize:**
+
+For more parameter customizations, refer to inference-perf [guides](https://github.com/kubernetes-sigs/inference-perf/blob/main/docs/config.md)
+
+*   `benchmark`: A unique name for this deployment.
+*   `hfToken`: Your hugging face token.
+*   `config.server.base_url`: The base URL (IP and port) of your inference server.
+
+### Storage Parameters
+
+#### 1. Local Storage (Default)
+
+By default, reports are saved locally but **lost when the Pod terminates**.
+```yaml
+storage:
+  local_storage:
+    path: "reports-{timestamp}"       # Local directory path
+    report_file_prefix: null          # Optional filename prefix
+```
+
+#### 2. Google Cloud Storage (GCS)
+
+Use the `google_cloud_storage` block to save reports to a GCS bucket.
+
+```yaml
+storage:
+  google_cloud_storage:               # Optional GCS configuration
+    bucket_name: "your-bucket-name"   # Required GCS bucket
+    path: "reports-{timestamp}"       # Optional path prefix
+    report_file_prefix: null          # Optional filename prefix
+```
+
+###### 🚨 GCS Permissions Checklist (Required for Write Access)
+
+1. **IAM Role (Service Account):** Bound to the target bucket.
+
+   * **Minimum:** **Storage Object Creator** (`roles/storage.objectCreator`)
+
+   * **Full:** **Storage Object Admin** (`roles/storage.objectAdmin`)
+
+2. **Node Access Scope (GKE Node Pool):** Set during node pool creation.
+
+   * **Required Scope:** **`devstorage.read_write`** or **`cloud-platform`**
+
+#### 3. Simple Storage Service (S3)
+
+Use the `simple_storage_service` block for S3-compatible storage. Requires appropriate AWS credentials configured in the runtime environment.
+
+```yaml
+storage:
+  simple_storage_service:
+    bucket_name: "your-bucket-name"   # Required S3 bucket
+    path: "reports-{timestamp}"       # Optional path prefix
+    report_file_prefix: null          # Optional filename prefix
+```
+
+## Uninstalling the Chart
+
+To uninstall the deployed chart:
+
+```bash
+helm uninstall my-benchmark
+```
+
@@ -0,0 +1,62 @@
+job:
+  image:
+    repository: quay.io/inference-perf/inference-perf
+    tag: "" # Defaults to .Chart.AppVersion
+  nodeSelector: {}
+  # Example resources:
+  # resources:
+  #   requests:
+  #     cpu: "1"
+  #     memory: "4Gi"
+  #   limits:
+  #     cpu: "2"
+  #     memory: "8Gi"
+  resources: {}
+
+logLevel: INFO
+
+# A GCS bucket path that points to the dataset file.
+# The file will be copied from this path to the local file system
+# at /dataset/dataset.json for use during the run.
+# NOTE: For this dataset to be used, config.data.path must also be explicitly set to /dataset/dataset.json.
+gcsPath: ""
+
+# hfToken optionally creates a secret with the specified token.
+# Can be set using helm install --set hftoken=<token>
+hfToken: ""
+
+config:
+  load:
+    type: constant
+    interval: 15
+    stages:
+    - rate: 10
+      duration: 20
+    - rate: 20
+      duration: 20
+    - rate: 30
+      duration: 20
+  api:
+    type: completion
+    streaming: true
+  server:
+    type: vllm
+    model_name: meta-llama/Llama-3.1-8B-Instruct
+    base_url: http://0.0.0.0:8000
+    ignore_eos: true
+  tokenizer:
+    pretrained_model_name_or_path: meta-llama/Llama-3.1-8B-Instruct
+  data:
+    type: shareGPT
+  metrics:
+    type: prometheus
+    prometheus:
+      google_managed: true
+  report:
+    request_lifecycle:
+      summary: true
+      per_stage: true
+      per_request: true
+    prometheus:
+      summary: true
+      per_stage: true