[RFC]: Model architecture plugins

### Motivation.

As a continuation to #5367 - as this merge request was rejected and I have to maintain my own fork to support this scenario, I suggest we should add support in vLLM for model architecture plugins.
This will allow vLLM to easily add new model architectures without changing vLLM's core logic, and support scenarios such as uneven GPU tensor parallelism.

We could build an ecosystem of model architecture plugins - which could accelerate new model support by a lot without risking existing functionality.

### Proposed Change.

Supporting this in it's basic form is simple as we just have to add loaded plugins to the `ModelRegistry`.
To support more complex model architectures (Such in the #5367 case), we should decouple the `Config` class which provides the amount of attention heads from vLLM's core logic, and allow each model architecture to override these values.

### Feedback Period.

_No response_

### CC List.

@youkaichao 

### Any Other Things.

Just to make it clear, I'll be happy to implement this, but I want hear some feedback before I go ahead and implement this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC]: Model architecture plugins #7124

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: Model architecture plugins #7124

Description

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions