EPP Multi-tenancy

I have a question re: guidance for implementers.

Is the intent behind the *current* `inference model` and `inference pool` design the following?

1. There is namespace isolation between base models: specifically, each base model gets deployed in its own k8s namespace.
2. There is an InferencePool that targets a given base model. So, exactly one inference pool (and one base model) per k8s namespace.
3. There can be multiple LoRA adapters for a given base model. All LoRA adapters must be loaded onto all pods for the given base model. 

***

I'm not sure about upcoming enhancements to the CRDs, but I am trying to understand if above is the manner in which the current CRDs are intended to be used.

Thanks in advance for your clarifications!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EPP Multi-tenancy #724

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EPP Multi-tenancy #724

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions