Skip to content

EPP Multi-tenancy #724

@sriumcp

Description

@sriumcp

I have a question re: guidance for implementers.

Is the intent behind the current inference model and inference pool design the following?

  1. There is namespace isolation between base models: specifically, each base model gets deployed in its own k8s namespace.
  2. There is an InferencePool that targets a given base model. So, exactly one inference pool (and one base model) per k8s namespace.
  3. There can be multiple LoRA adapters for a given base model. All LoRA adapters must be loaded onto all pods for the given base model.

I'm not sure about upcoming enhancements to the CRDs, but I am trying to understand if above is the manner in which the current CRDs are intended to be used.

Thanks in advance for your clarifications!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions