Skip to content

Commit d57a1ea

Browse files
Updated PVCs. added GPU support, added MacOS support
1 parent c01a078 commit d57a1ea

18 files changed

+918
-674
lines changed

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,4 +46,7 @@ cscope.*
4646

4747
# Helm chart dependecies cache
4848
**/Chart.lock
49-
**/charts/*.tgz
49+
**/charts/*.tgz
50+
51+
# Helm chart output directory
52+
ai/ai-starter-kit/out

ai/ai-starter-kit/Makefile

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,48 @@
1+
.PHONY: check_hf_token check_OCI_target package_helm lint dep_update install install_gke start uninstall push_helm
2+
3+
check_hf_token:
4+
ifndef HF_TOKEN
5+
$(error HF_TOKEN is not set)
6+
endif
7+
8+
check_OCI_target:
9+
ifndef OCI_HELM_TARGET
10+
$(error OCI_HELM_TARGET is not set)
11+
endif
12+
13+
package_helm:
14+
helm package helm-chart/ai-starter-kit/ --destination out/
15+
16+
push_helm: check_OCI_target
17+
helm push out/ai-starter-kit* oci://$$OCI_HELM_TARGET
18+
119
lint:
220
helm lint helm-chart/ai-starter-kit
321

422
dep_update:
523
helm dependency update helm-chart/ai-starter-kit
624

7-
install:
8-
helm upgrade --install ai-starter-kit helm-chart/ai-starter-kit --set huggingface.token="your_hf_token" --timeout 10m -f helm-chart/ai-starter-kit/values.yaml
25+
install: check_hf_token
26+
helm upgrade --install ai-starter-kit helm-chart/ai-starter-kit --set huggingface.token="$$HF_TOKEN" --timeout 10m -f helm-chart/ai-starter-kit/values.yaml
27+
28+
install_gke: check_hf_token
29+
helm upgrade --install ai-starter-kit helm-chart/ai-starter-kit --set huggingface.token="$$HF_TOKEN" --timeout 10m -f helm-chart/ai-starter-kit/values-gke.yaml
930

10-
install_gke:
11-
helm upgrade --install ai-starter-kit helm-chart/ai-starter-kit --set huggingface.token="your_hf_token" --timeout 10m -f helm-chart/ai-starter-kit/values-gke.yaml
31+
install_gke_gpu: check_hf_token
32+
helm upgrade --install ai-starter-kit helm-chart/ai-starter-kit --set huggingface.token="$$HF_TOKEN" --timeout 10m -f helm-chart/ai-starter-kit/values-gke-gpu.yaml
1233

1334
start:
1435
mkdir -p /tmp/models-cache
1536
minikube start --cpus 4 --memory 15000 --mount --mount-string="/tmp/models-cache:/tmp/models-cache"
1637

38+
start_gpu:
39+
mkdir -p $HOME/models-cache
40+
minikube start --driver krunkit --cpus 4 --memory 15000 --mount --mount-string="$HOME/models-cache:$HOME/models-cache"
41+
1742
uninstall:
1843
helm uninstall ai-starter-kit
44+
kubectl delete pod jupyter-user
45+
kubectl delete pvc ai-starter-kit-jupyterhub-hub-db-dir
1946

2047
destroy:
2148
minikube delete
Lines changed: 291 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,291 @@
1+
# AI Starter Kit
2+
3+
A comprehensive Helm chart for deploying a complete AI/ML development environment on Kubernetes. This starter kit provides a ready-to-use platform with JupyterHub notebooks, model serving capabilities, and experiment tracking - perfect for teams starting their AI journey or prototyping AI applications.
4+
5+
## Purpose
6+
7+
The AI Starter Kit simplifies the deployment of AI infrastructure by providing:
8+
9+
- **JupyterHub**: Multi-user notebook environment with pre-configured AI/ML libraries
10+
- **Model Serving**: Support for both Ollama and Ramalama model servers
11+
- **MLflow**: Experiment tracking and model management
12+
- **GPU Support**: Configurations for GPU acceleration on GKE and macOS
13+
- **Model Caching**: Persistent storage for efficient model management
14+
- **Example Notebooks**: Pre-loaded notebooks to get you started immediately
15+
16+
## Prerequisites
17+
18+
### General Requirements
19+
- Kubernetes cluster (minikube, GKE)
20+
- Helm 3.x installed
21+
- kubectl configured to access your cluster
22+
- Hugging Face token for accessing models
23+
24+
### Platform-Specific Requirements
25+
26+
#### Minikube (Local Development)
27+
- Docker Desktop or similar container runtime
28+
- Minimum 4 CPU cores and 16GB RAM available
29+
- 40GB+ free disk space
30+
31+
#### GKE (Google Kubernetes Engine)
32+
- Google Cloud CLI (`gcloud`) installed and configured
33+
- Appropriate GCP permissions to create clusters
34+
35+
#### macOS with GPU (Apple Silicon)
36+
- macOS with Apple Silicon (M1/M2/M3/M4)
37+
- minikube with krunkit driver
38+
- 16GB+ RAM recommended
39+
40+
## Installation
41+
42+
### Quick Start (Minikube)
43+
44+
1. **Start minikube with persistent storage:**
45+
```bash
46+
minikube start --cpus 4 --memory 15000 \
47+
--mount --mount-string="/tmp/models-cache:/tmp/models-cache"
48+
```
49+
50+
2. **Install the chart:**
51+
```bash
52+
helm install ai-starter-kit . \
53+
--set huggingface.token="YOUR_HF_TOKEN" \
54+
-f values.yaml
55+
```
56+
57+
3. **Access JupyterHub:**
58+
```bash
59+
kubectl port-forward svc/ai-starter-kit-jupyterhub-proxy-public 8080:80
60+
```
61+
Navigate to http://localhost:8080 and login with any username and password `sneakypass`
62+
63+
### GKE Deployment
64+
65+
1. **Create a GKE Autopilot cluster:**
66+
```bash
67+
export REGION=us-central1
68+
export CLUSTER_NAME="ai-starter-cluster"
69+
export PROJECT_ID=$(gcloud config get project)
70+
71+
gcloud container clusters create-auto ${CLUSTER_NAME} \
72+
--project=${PROJECT_ID} \
73+
--region=${REGION} \
74+
--release-channel=rapid \
75+
--labels=created-by=ai-on-gke,guide=ai-starter-kit
76+
```
77+
78+
2. **Get cluster credentials:**
79+
```bash
80+
gcloud container clusters get-credentials ${CLUSTER_NAME} --location=${REGION}
81+
```
82+
83+
3. **Install the chart with GKE-specific values:**
84+
```bash
85+
helm install ai-starter-kit . \
86+
--set huggingface.token="YOUR_HF_TOKEN" \
87+
-f values.yaml \
88+
-f values-gke.yaml
89+
```
90+
91+
### GKE with GPU (Ollama)
92+
93+
For GPU-accelerated model serving with Ollama:
94+
95+
```bash
96+
helm install ai-starter-kit . \
97+
--set huggingface.token="YOUR_HF_TOKEN" \
98+
-f values-gke.yaml \
99+
-f values-ollama-gpu.yaml
100+
```
101+
102+
### GKE with GPU (Ramalama)
103+
104+
For GPU-accelerated model serving with Ramalama:
105+
106+
```bash
107+
helm install ai-starter-kit . \
108+
--set huggingface.token="YOUR_HF_TOKEN" \
109+
-f values-gke.yaml \
110+
-f values-ramalama-gpu.yaml
111+
```
112+
113+
### macOS with Apple Silicon GPU
114+
115+
1. **Start minikube with krunkit driver:**
116+
```bash
117+
minikube start --driver krunkit \
118+
--cpus 8 --memory 16000 --disk-size 40000mb \
119+
--mount --mount-string="/tmp/models-cache:/tmp/models-cache"
120+
```
121+
122+
2. **Install with macOS GPU support:**
123+
```bash
124+
helm install ai-starter-kit . \
125+
--set huggingface.token="YOUR_HF_TOKEN" \
126+
-f values.yaml \
127+
-f values-macos.yaml
128+
```
129+
130+
## Configuration
131+
132+
### Key Configuration Options
133+
134+
| Parameter | Description | Default |
135+
|-----------|-------------|---------|
136+
| `huggingface.token` | HuggingFace token for models | `"YOUR_HF_TOKEN"` |
137+
| `ollama.enabled` | Enable Ollama model server | `true` |
138+
| `ramalama.enabled` | Enable Ramalama model server | `true` |
139+
| `modelsCachePvc.size` | Size of model cache storage | `10Gi` |
140+
| `jupyterhub.singleuser.defaultUrl` | Default notebook path | `/lab/tree/welcome.ipynb` |
141+
| `mlflow.enabled` | Enable MLflow tracking server | `true` |
142+
143+
### Storage Configuration
144+
145+
The chart supports different storage configurations:
146+
147+
- **Local Development**: Uses hostPath volumes with minikube mount
148+
- **GKE**: Uses standard GKE storage classes (`standard-rwo`, `standard-rwx`)
149+
- **Custom**: Configure via `modelsCachePvc.storageClassName`
150+
151+
### Model Servers
152+
153+
#### Ollama
154+
Ollama is enabled by default and provides:
155+
- Easy model management
156+
- REST API for inference
157+
- Support for popular models (Llama, Gemma, Qwen, etc.)
158+
- GPU acceleration support
159+
160+
#### Ramalama
161+
Ramalama provides:
162+
- Alternative model serving solution
163+
- Support for CUDA and Metal (macOS) acceleration
164+
- Lightweight deployment option
165+
166+
You can run either Ollama or Ramalama, but not both simultaneously. Toggle using:
167+
```yaml
168+
ollama:
169+
enabled: true/false
170+
ramalama:
171+
enabled: true/false
172+
```
173+
174+
## Usage
175+
176+
### Accessing Services
177+
178+
#### JupyterHub
179+
```bash
180+
# Port forward to access JupyterHub
181+
kubectl port-forward svc/ai-starter-kit-jupyterhub-proxy-public 8080:80
182+
# Access at: http://localhost:8080
183+
# Default password: sneakypass
184+
```
185+
186+
#### MLflow
187+
```bash
188+
# Port forward to access MLflow UI
189+
kubectl port-forward svc/ai-starter-kit-mlflow 5000:5000
190+
# Access at: http://localhost:5000
191+
```
192+
193+
#### Ollama/Ramalama API
194+
```bash
195+
# For Ollama
196+
kubectl port-forward svc/ai-starter-kit-ollama 11434:11434
197+
198+
# For Ramalama
199+
kubectl port-forward svc/ai-starter-kit-ramalama 8080:8080
200+
```
201+
202+
### Pre-loaded Example Notebooks
203+
204+
The JupyterHub environment comes with pre-loaded example notebooks:
205+
- `chat_bot.ipynb`: Simple chatbot interface using Ollama for conversational AI.
206+
- `multi-agent-ollama.ipynb`: Multi-agent workflow demonstration using Ollama.
207+
- `multi-agent-ramalama.ipynb`: Similar multi-agent workflow using RamaLama runtime for comparison.
208+
- `welcome.ipynb`: Introduction notebook with embedding model examples using Qwen models.
209+
210+
These notebooks are automatically copied to your workspace on first login.
211+
212+
## Architecture
213+
214+
The AI Starter Kit consists of:
215+
216+
1. **JupyterHub**: Multi-user notebook server with persistent storage
217+
2. **Model Serving**: Choice of Ollama or Ramalama for LLM inference
218+
3. **MLflow**: Experiment tracking and model registry
219+
4. **Persistent Storage**: Shared model cache to avoid redundant downloads
220+
5. **Init Containers**: Automated setup of models and notebooks
221+
222+
## Cleanup
223+
224+
### Uninstall the chart
225+
```bash
226+
helm uninstall ai-starter-kit
227+
```
228+
229+
### Delete persistent volumes (optional)
230+
```bash
231+
kubectl delete pvc ai-starter-kit-models-cache-pvc
232+
kubectl delete pvc ai-starter-kit-jupyterhub-hub-db-dir
233+
```
234+
235+
### Delete GKE cluster
236+
```bash
237+
gcloud container clusters delete ${CLUSTER_NAME} --region=${REGION}
238+
```
239+
240+
### Stop minikube
241+
```bash
242+
minikube stop
243+
minikube delete # To completely remove the cluster
244+
```
245+
246+
## Troubleshooting
247+
248+
### Common Issues
249+
250+
#### Pods stuck in Pending state
251+
- Check available resources: `kubectl describe pod <pod-name>`
252+
- Increase cluster resources or reduce resource requests
253+
254+
#### Model download failures
255+
- Verify Hugging Face token is set correctly
256+
- Check internet connectivity from pods
257+
- Increase init container timeout in values
258+
259+
#### GPU not detected
260+
- Verify GPU nodes are available: `kubectl get nodes -o wide`
261+
- Check GPU driver installation
262+
- Ensure correct node selectors and tolerations
263+
264+
#### Storage issues
265+
- Verify PVC is bound: `kubectl get pvc`
266+
- Check storage class availability: `kubectl get storageclass`
267+
- Ensure sufficient disk space
268+
269+
### Debug Commands
270+
```bash
271+
# Check pod status
272+
kubectl get pods -n default
273+
274+
# View pod logs
275+
kubectl logs -f <pod-name>
276+
277+
# Describe pod for events
278+
kubectl describe pod <pod-name>
279+
280+
# Check resource usage
281+
kubectl top nodes
282+
kubectl top pods
283+
```
284+
285+
## Resources
286+
287+
- [JupyterHub Documentation](https://jupyterhub.readthedocs.io/)
288+
- [MLflow Documentation](https://mlflow.org/docs/latest/index.html)
289+
- [Ollama Documentation](https://ollama.ai/docs)
290+
- [Kubernetes Documentation](https://kubernetes.io/docs/)
291+
- [Helm Documentation](https://helm.sh/docs/)
Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,22 @@
1+
import sys
12
from huggingface_hub import snapshot_download
23

34
# --- Model Download ---
4-
# List your desired Hugging Face model names here
5-
model_names = [
6-
"Qwen/Qwen3-Embedding-0.6B",
7-
]
5+
if __name__ == "__main__":
6+
# List your desired Hugging Face model names here
7+
model_names = [
8+
"Qwen/Qwen3-Embedding-0.6B",
9+
]
810

9-
for model_name in model_names:
10-
print(f"--- Downloading {model_name} ---")
11-
try:
12-
snapshot_download(repo_id=model_name)
13-
print(f"Successfully cached {model_name}")
14-
except Exception as e:
15-
print(f"Failed to download {model_name}. Error: {e}")
11+
for model_name in model_names:
12+
print(f"--- Downloading {model_name} ---")
13+
try:
14+
if len(sys.argv) > 1:
15+
snapshot_download(repo_id=model_name, cache_dir=sys.argv[0])
16+
else:
17+
snapshot_download(repo_id=model_name)
18+
print(f"Successfully cached {model_name}")
19+
except Exception as e:
20+
print(f"Failed to download {model_name}. Error: {e}")
1621

17-
print("--- Model download process finished. ---")
22+
print("--- Model download process finished. ---")

0 commit comments

Comments
 (0)