You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. Configure a canary rollout with traffic split using InferenceModel. In this example, 10% of traffic to the chatbot model will be sent to `tweet-summary-3`.
45
+
2. Configure a canary rollout with traffic split using LLMService. In this example, 40% of traffic for tweet-summary model will be sent to the ***tweet-summary-2*** adapter .
42
46
43
47
```yaml
44
48
model:
45
-
name: chatbot
49
+
name: tweet-summary
46
50
targetModels:
47
-
targetModelName: chatbot-v1
48
-
weight: 90
49
-
targetModelName: chatbot-v2
51
+
targetModelName: tweet-summary-0
50
52
weight: 10
53
+
targetModelName: tweet-summary-1
54
+
weight: 40
55
+
targetModelName: tweet-summary-2
56
+
weight: 40
57
+
51
58
```
52
59
53
60
3. Finish rollout by setting the traffic to the new version 100%.
54
61
```yaml
55
62
model:
56
-
name: chatbot
63
+
name: tweet-summary
57
64
targetModels:
58
-
targetModelName: chatbot-v2
65
+
targetModelName: tweet-summary-2
59
66
weight: 100
60
67
```
61
68
@@ -68,12 +75,19 @@ model:
68
75
data:
69
76
configmap.yaml: |
70
77
vLLMLoRAConfig:
71
-
ensureExist:
72
-
models:
73
-
- id: chatbot-v2
74
-
source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v2
75
-
ensureNotExist: # Explicitly unregisters the adapter from model servers
0 commit comments