Skip to content

[RFC]: Model architecture plugins #7124

@NadavShmayo

Description

@NadavShmayo

Motivation.

As a continuation to #5367 - as this merge request was rejected and I have to maintain my own fork to support this scenario, I suggest we should add support in vLLM for model architecture plugins.
This will allow vLLM to easily add new model architectures without changing vLLM's core logic, and support scenarios such as uneven GPU tensor parallelism.

We could build an ecosystem of model architecture plugins - which could accelerate new model support by a lot without risking existing functionality.

Proposed Change.

Supporting this in it's basic form is simple as we just have to add loaded plugins to the ModelRegistry.
To support more complex model architectures (Such in the #5367 case), we should decouple the Config class which provides the amount of attention heads from vLLM's core logic, and allow each model architecture to override these values.

Feedback Period.

No response

CC List.

@youkaichao

Any Other Things.

Just to make it clear, I'll be happy to implement this, but I want hear some feedback before I go ahead and implement this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions