Skip to content

Commit d7865df

Browse files
committed
Add benchmarking folder with common config set ups
1 parent 831a919 commit d7865df

File tree

5 files changed

+873
-18
lines changed

5 files changed

+873
-18
lines changed

benchmarking/README.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
## Prerequisites
2+
3+
Before you begin, ensure you have the following:
4+
5+
* **Helm 3+**: [Installation Guide](https://helm.sh/docs/intro/install/)
6+
* **Kubernetes Cluster**: Access to a Kubernetes cluster
7+
* **Gateway Deployed**: Your inference server/gateway must be deployed and accessible within the cluster. [Getting Started Guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/#getting-started-with-gateway-api-inference-extension).
8+
* **Hugging Face Token Secret**: A Hugging Face token to pull tokenizers.
9+
10+
## Deployment
11+
12+
To deploy the benchmarking chart:
13+
14+
```bash
15+
export IP='<YOUR_IP>'
16+
export PORT='<YOUR_PORT>'
17+
export HF_TOKEN='<YOUR HUGGING_FACE_TOKEN>'
18+
export CHART_VERSION=v0.2.0
19+
helm install benchmark -f benchmark-values.yaml \
20+
--set hftoken=${HF_TOKEN} \
21+
--set "config.server.base_url=http://${IP}:${PORT}" \
22+
oci://quay.io/inference-perf/charts/inference-perf:${CHART_VERSION}
23+
```
24+
25+
**Parameters to customize:**
26+
27+
For more parameter customizations, refer to inference-perf [guides](https://github.com/kubernetes-sigs/inference-perf/blob/main/docs/config.md)
28+
29+
* `benchmark`: A unique name for this deployment.
30+
* `hfToken`: Your hugging face token.
31+
* `config.server.base_url`: The base URL (IP and port) of your inference server.
32+
33+
### Storage Parameters
34+
35+
#### 1. Local Storage (Default)
36+
37+
By default, reports are saved locally but **lost when the Pod terminates**.
38+
```yaml
39+
storage:
40+
local_storage:
41+
path: "reports-{timestamp}" # Local directory path
42+
report_file_prefix: null # Optional filename prefix
43+
```
44+
45+
#### 2. Google Cloud Storage (GCS)
46+
47+
Use the `google_cloud_storage` block to save reports to a GCS bucket.
48+
49+
```yaml
50+
storage:
51+
google_cloud_storage: # Optional GCS configuration
52+
bucket_name: "your-bucket-name" # Required GCS bucket
53+
path: "reports-{timestamp}" # Optional path prefix
54+
report_file_prefix: null # Optional filename prefix
55+
```
56+
57+
###### 🚨 GCS Permissions Checklist (Required for Write Access)
58+
59+
1. **IAM Role (Service Account):** Bound to the target bucket.
60+
61+
* **Minimum:** **Storage Object Creator** (`roles/storage.objectCreator`)
62+
63+
* **Full:** **Storage Object Admin** (`roles/storage.objectAdmin`)
64+
65+
2. **Node Access Scope (GKE Node Pool):** Set during node pool creation.
66+
67+
* **Required Scope:** **`devstorage.read_write`** or **`cloud-platform`**
68+
69+
#### 3. Simple Storage Service (S3)
70+
71+
Use the `simple_storage_service` block for S3-compatible storage. Requires appropriate AWS credentials configured in the runtime environment.
72+
73+
```yaml
74+
storage:
75+
simple_storage_service:
76+
bucket_name: "your-bucket-name" # Required S3 bucket
77+
path: "reports-{timestamp}" # Optional path prefix
78+
report_file_prefix: null # Optional filename prefix
79+
```
80+
81+
## Uninstalling the Chart
82+
83+
To uninstall the deployed chart:
84+
85+
```bash
86+
helm uninstall my-benchmark
87+
```
88+

benchmarking/benchmark-values.yaml

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
job:
2+
image:
3+
repository: quay.io/inference-perf/inference-perf
4+
tag: "" # Defaults to .Chart.AppVersion
5+
nodeSelector: {}
6+
# Example resources:
7+
# resources:
8+
# requests:
9+
# cpu: "1"
10+
# memory: "4Gi"
11+
# limits:
12+
# cpu: "2"
13+
# memory: "8Gi"
14+
resources: {}
15+
16+
logLevel: INFO
17+
18+
# A GCS bucket path that points to the dataset file.
19+
# The file will be copied from this path to the local file system
20+
# at /dataset/dataset.json for use during the run.
21+
# NOTE: For this dataset to be used, config.data.path must also be explicitly set to /dataset/dataset.json.
22+
gcsPath: ""
23+
24+
# hfToken optionally creates a secret with the specified token.
25+
# Can be set using helm install --set hftoken=<token>
26+
hfToken: ""
27+
28+
config:
29+
load:
30+
type: constant
31+
interval: 15
32+
stages:
33+
- rate: 10
34+
duration: 20
35+
- rate: 20
36+
duration: 20
37+
- rate: 30
38+
duration: 20
39+
api:
40+
type: completion
41+
streaming: true
42+
server:
43+
type: vllm
44+
model_name: meta-llama/Llama-3.1-8B-Instruct
45+
base_url: http://0.0.0.0:8000
46+
ignore_eos: true
47+
tokenizer:
48+
pretrained_model_name_or_path: meta-llama/Llama-3.1-8B-Instruct
49+
data:
50+
type: shareGPT
51+
metrics:
52+
type: prometheus
53+
prometheus:
54+
google_managed: true
55+
report:
56+
request_lifecycle:
57+
summary: true
58+
per_stage: true
59+
per_request: true
60+
prometheus:
61+
summary: true
62+
per_stage: true

0 commit comments

Comments
 (0)