generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 194
Closed
Description
I have a question re: guidance for implementers.
Is the intent behind the current inference model and inference pool design the following?
- There is namespace isolation between base models: specifically, each base model gets deployed in its own k8s namespace.
- There is an InferencePool that targets a given base model. So, exactly one inference pool (and one base model) per k8s namespace.
- There can be multiple LoRA adapters for a given base model. All LoRA adapters must be loaded onto all pods for the given base model.
I'm not sure about upcoming enhancements to the CRDs, but I am trying to understand if above is the manner in which the current CRDs are intended to be used.
Thanks in advance for your clarifications!
Metadata
Metadata
Assignees
Labels
No labels