You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/proposals/002-api-proposal/README.md
+42-30Lines changed: 42 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -228,6 +228,7 @@ type InferenceModel struct {
228
228
metav1.TypeMeta
229
229
230
230
SpecInferenceModelSpec
231
+
StatusInferenceModelStatus
231
232
}
232
233
233
234
typeInferenceModelSpecstruct {
@@ -253,22 +254,39 @@ type InferenceModelSpec struct {
253
254
// If not specified, the target model name is defaulted to the ModelName parameter.
254
255
// ModelName is often in reference to a LoRA adapter.
255
256
TargetModels []TargetModel
256
-
// Reference to the InferencePool that the model registers to. It must exist in the same namespace.
257
-
PoolReference *LocalObjectReference
257
+
// PoolRef is a reference to the inference pool, the pool must exist in the same namespace.
258
+
PoolRefPoolObjectReference
259
+
}
260
+
261
+
// PoolObjectReference identifies an API object within the namespace of the
262
+
// referrer.
263
+
typePoolObjectReferencestruct {
264
+
// Group is the group of the referent.
265
+
GroupGroup
266
+
267
+
// Kind is kind of the referent. For example "InferencePool".
268
+
KindKind
269
+
270
+
// Name is the name of the referent.
271
+
NameObjectName
258
272
}
259
273
260
274
// Defines how important it is to serve the model compared to other models.
261
275
// Criticality is intentionally a bounded enum to contain the possibilities that need to be supported by the load balancing algorithm. Any reference to the Criticality field should ALWAYS be optional(use a pointer), and set no default.
262
276
// This allows us to union this with a oneOf field in the future should we wish to adjust/extend this behavior.
263
277
typeCriticalitystring
264
278
const (
265
-
// Most important. Requests to this band will be shed last.
266
-
CriticalCriticality = "Critical"
267
-
// More important than Sheddable, less important than Critical.
268
-
// Requests in this band will be shed before critical traffic.
269
-
DefaultCriticality = "Default"
270
-
// Least important. Requests to this band will be shed before all other bands.
271
-
SheddableCriticality = "Sheddable"
279
+
// Critical defines the highest level of criticality. Requests to this band will be shed last.
280
+
CriticalCriticality = "Critical"
281
+
282
+
// Standard defines the base criticality level and is more important than Sheddable but less
283
+
// important than Critical. Requests in this band will be shed before critical traffic.
284
+
// Most models are expected to fall within this band.
285
+
StandardCriticality = "Standard"
286
+
287
+
// Sheddable defines the lowest level of criticality. Requests to this band will be shed before
288
+
// all other bands.
289
+
SheddableCriticality = "Sheddable"
272
290
)
273
291
274
292
// TargetModel represents a deployed model or a LoRA adapter. The
@@ -281,24 +299,16 @@ const (
281
299
typeTargetModelstruct {
282
300
// The name of the adapter as expected by the ModelServer.
283
301
Namestring
284
-
// Weight is used to determine the percentage of traffic that should be
302
+
// Weight is used to determine the percentage of traffic that should be
285
303
// sent to this target model when multiple versions of the model are specified.
286
-
Weight *int
304
+
Weight *int32
287
305
}
288
306
289
-
// LocalObjectReference identifies an API object within the namespace of the
290
-
// referrer.
291
-
typeLocalObjectReferencestruct {
292
-
// Group is the group of the referent.
293
-
GroupGroup
294
-
295
-
// Kind is kind of the referent. For example "InferencePool".
296
-
KindKind
297
-
298
-
// Name is the name of the referent.
299
-
NameObjectName
307
+
// InferenceModelStatus defines the observed state of InferenceModel
308
+
typeInferenceModelStatusstruct {
309
+
// Conditions track the state of the InferenceModel.
310
+
Conditions []metav1.Condition
300
311
}
301
-
302
312
```
303
313
304
314
### Yaml Examples
@@ -322,27 +332,29 @@ spec:
322
332
323
333
Here we consume the pool with two InferenceModels. Where `sql-code-assist` is both the name of the model and the name of the LoRA adapter on the model server. And `npc-bot` has a layer of indirection for those names, as well as a specified criticality. Both `sql-code-assist` and `npc-bot` have available LoRA adapters on the InferencePool and routing to each InferencePool happens earlier (at the K8s Gateway).
0 commit comments