Skip to content

Commit eeda363

Browse files
committed
Add benchmarking folder with common config set ups
1 parent 831a919 commit eeda363

File tree

5 files changed

+809
-15
lines changed

5 files changed

+809
-15
lines changed

benchmarking/README.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Benchmarking Helm Chart
2+
3+
This Helm chart deploys the `inference-perf` benchmarking tool. This guide will walk you through deploying a basic benchmarking job. By default, the `shareGPT` dataset is used for configuration.
4+
5+
## Prerequisites
6+
7+
Before you begin, ensure you have the following:
8+
9+
* **Helm 3+**: [Installation Guide](https://helm.sh/docs/intro/install/)
10+
* **Kubernetes Cluster**: Access to a Kubernetes cluster
11+
* **Gateway Deployed**: Your inference server/gateway must be deployed and accessible within the cluster.
12+
* **Hugging Face Token Secret**: A Hugging Face token to pull tokenizers.
13+
14+
## Deployment
15+
16+
To deploy the benchmarking chart:
17+
18+
```bash
19+
export IP='<YOUR_IP>'
20+
export PORT='<YOUR_PORT>'
21+
export HF_TOKEN='<YOUR HUGGING_FACE_TOKEN>'
22+
helm install benchmark -f benchmark-values.yaml \
23+
--set hftoken=${HF_TOKEN} \
24+
--set "config.server.base_url=http://${IP}:${PORT}" \
25+
oci://quay.io/inference-perf/charts/inference-perf:latest
26+
```
27+
28+
**Parameters to customize:**
29+
30+
For more parameter customizations, refer to inference-perf [guides](https://github.com/kubernetes-sigs/inference-perf/blob/main/docs/config.md)
31+
32+
* `benchmark`: A unique name for this deployment.
33+
* `hfToken`: Your hugging face token.
34+
* `config.server.base_url`: The base URL (IP and port) of your inference server.
35+
36+
### Storage Parameters
37+
38+
#### 1. Local Storage (Default)
39+
40+
By default, reports are saved locally but **lost when the Pod terminates**.
41+
```yaml
42+
storage:
43+
local_storage:
44+
path: "reports-{timestamp}" # Local directory path
45+
report_file_prefix: null # Optional filename prefix
46+
```
47+
48+
#### 2. Google Cloud Storage (GCS)
49+
50+
Use the `google_cloud_storage` block to save reports to a GCS bucket.
51+
52+
```yaml
53+
storage:
54+
google_cloud_storage: # Optional GCS configuration
55+
bucket_name: "your-bucket-name" # Required GCS bucket
56+
path: "reports-{timestamp}" # Optional path prefix
57+
report_file_prefix: null # Optional filename prefix
58+
```
59+
60+
###### 🚨 GCS Permissions Checklist (Required for Write Access)
61+
62+
1. **IAM Role (Service Account):** Bound to the target bucket.
63+
64+
* **Minimum:** **Storage Object Creator** (`roles/storage.objectCreator`)
65+
66+
* **Full:** **Storage Object Admin** (`roles/storage.objectAdmin`)
67+
68+
2. **Node Access Scope (GKE Node Pool):** Set during node pool creation.
69+
70+
* **Required Scope:** **`devstorage.read_write`** or **`cloud-platform`**
71+
72+
#### 3. Simple Storage Service (S3)
73+
74+
Use the `simple_storage_service` block for S3-compatible storage. Requires appropriate AWS credentials configured in the runtime environment.
75+
76+
```yaml
77+
storage:
78+
simple_storage_service:
79+
bucket_name: "your-bucket-name" # Required S3 bucket
80+
path: "reports-{timestamp}" # Optional path prefix
81+
report_file_prefix: null # Optional filename prefix
82+
```
83+
84+
## Uninstalling the Chart
85+
86+
To uninstall the deployed chart:
87+
88+
```bash
89+
helm uninstall my-benchmark
90+
```
91+

benchmarking/benchmark-values.yaml

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
job:
2+
image:
3+
repository: quay.io/inference-perf/inference-perf
4+
tag: "" # Defaults to .Chart.AppVersion
5+
nodeSelector: {}
6+
# Example resources:
7+
# resources:
8+
# requests:
9+
# cpu: "1"
10+
# memory: "4Gi"
11+
# limits:
12+
# cpu: "2"
13+
# memory: "8Gi"
14+
resources: {}
15+
16+
logLevel: INFO
17+
18+
# A GCS bucket path that points to the dataset file.
19+
# The file will be copied from this path to the local file system
20+
# at /dataset/dataset.json for use during the run.
21+
# NOTE: For this dataset to be used, config.data.path must also be explicitly set to /dataset/dataset.json.
22+
gcsPath: ""
23+
24+
# hfToken optionally creates a secret with the specified token.
25+
# Can be set using helm install --set hftoken=<token>
26+
hfToken: ""
27+
28+
config:
29+
load:
30+
type: constant
31+
interval: 15
32+
stages:
33+
- rate: 10
34+
duration: 20
35+
- rate: 20
36+
duration: 20
37+
- rate: 30
38+
duration: 20
39+
api:
40+
type: completion
41+
streaming: true
42+
server:
43+
type: vllm
44+
model_name: meta-llama/Llama-3.1-8B-Instruct
45+
base_url: http://0.0.0.0:8000
46+
ignore_eos: true
47+
tokenizer:
48+
pretrained_model_name_or_path: meta-llama/Llama-3.1-8B-Instruct
49+
data:
50+
type: shareGPT
51+
metrics:
52+
type: prometheus
53+
prometheus:
54+
google_managed: true
55+
report:
56+
request_lifecycle:
57+
summary: true
58+
per_stage: true
59+
per_request: true
60+
prometheus:
61+
summary: true
62+
per_stage: true

0 commit comments

Comments
 (0)