Skip to content

Commit 9b9df13

Browse files
committed
Deploy to separate namespace
1 parent 75b7b4f commit 9b9df13

File tree

4 files changed

+30
-17
lines changed

4 files changed

+30
-17
lines changed

AI/vllm-deployment/README.md

Lines changed: 24 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -36,31 +36,37 @@ This example demonstrates how to deploy a server for AI inference using [vLLM](h
3636

3737
## Detailed Steps & Explanation
3838

39-
1. Ensure Hugging Face permissions to retrieve model:
39+
1. Create the namespace:
40+
41+
```bash
42+
kubectl apply -f vllm-namespace.yaml
43+
```
44+
45+
2. Ensure Hugging Face permissions to retrieve model:
4046

4147
```bash
4248
# Env var HF_TOKEN contains hugging face account token
43-
kubectl create secret generic hf-secret \
49+
kubectl create secret generic hf-secret -n vllm-example \
4450
--from-literal=hf_token=$HF_TOKEN
4551
```
4652

47-
2. Apply vLLM server:
53+
3. Apply vLLM server:
4854

4955
```bash
50-
kubectl apply -f vllm-deployment.yaml
56+
kubectl apply -f vllm-deployment.yaml -n vllm-example
5157
```
5258

5359
- Wait for deployment to reconcile, creating vLLM pod(s):
5460

5561
```bash
56-
kubectl wait --for=condition=Available --timeout=900s deployment/vllm-gemma-deployment
57-
kubectl get pods -l app=gemma-server -w
62+
kubectl wait --for=condition=Available --timeout=900s deployment/vllm-gemma-deployment -n vllm-example
63+
kubectl get pods -l app=gemma-server -w -n vllm-example
5864
```
5965

6066
- View vLLM pod logs:
6167

6268
```bash
63-
kubectl logs -f -l app=gemma-server
69+
kubectl logs -f -l app=gemma-server -n vllm-example
6470
```
6571

6672
Expected output:
@@ -77,11 +83,11 @@ Expected output:
7783
...
7884
```
7985

80-
3. Create service:
86+
4. Create service:
8187

8288
```bash
8389
# ClusterIP service on port 8080 in front of vllm deployment
84-
kubectl apply -f vllm-service.yaml
90+
kubectl apply -f vllm-service.yaml -n vllm-example
8591
```
8692

8793
## Verification / Seeing it Work
@@ -90,18 +96,18 @@ kubectl apply -f vllm-service.yaml
9096

9197
```bash
9298
# Forward a local port (e.g., 8080) to the service port (e.g., 8080)
93-
kubectl port-forward service/vllm-service 8080:8080
99+
kubectl port-forward service/vllm-service 8080:8080 -n vllm-example
94100
```
95101

96102
2. Send request to local forwarding port:
97103

98104
```bash
99105
curl -X POST http://localhost:8080/v1/chat/completions \
100106
-H "Content-Type: application/json" \
101-
-d '{
102-
"model": "google/gemma-3-1b-it",
103-
"messages": [{"role": "user", "content": "Explain Quantum Computing in simple terms."}],
104-
"max_tokens": 100
107+
-d '{ \
108+
"model": "google/gemma-3-1b-it", \
109+
"messages": [{"role": "user", "content": "Explain Quantum Computing in simple terms." }], \
110+
"max_tokens": 100 \
105111
}'
106112
```
107113

@@ -151,9 +157,10 @@ Node selectors make sure vLLM pods land on Nodes with the correct GPU, and they
151157
## Cleanup
152158
153159
```bash
154-
kubectl delete -f vllm-service.yaml
155-
kubectl delete -f vllm-deployment.yaml
156-
kubectl delete -f secret/hf_secret
160+
kubectl delete -f vllm-service.yaml -n vllm-example
161+
kubectl delete -f vllm-deployment.yaml -n vllm-example
162+
kubectl delete secret hf-secret -n vllm-example
163+
kubectl delete -f vllm-namespace.yaml
157164
```
158165

159166
---

AI/vllm-deployment/vllm-deployment.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ apiVersion: apps/v1
22
kind: Deployment
33
metadata:
44
name: vllm-gemma-deployment
5+
namespace: vllm-example
56
spec:
67
replicas: 1
78
selector:
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
apiVersion: v1
2+
kind: Namespace
3+
metadata:
4+
name: vllm-example

AI/vllm-deployment/vllm-service.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ apiVersion: v1
22
kind: Service
33
metadata:
44
name: vllm-service
5+
namespace: vllm-example
56
spec:
67
selector:
78
app: gemma-server

0 commit comments

Comments
 (0)