-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Description
Motivation.
As a continuation to #5367 - as this merge request was rejected and I have to maintain my own fork to support this scenario, I suggest we should add support in vLLM for model architecture plugins.
This will allow vLLM to easily add new model architectures without changing vLLM's core logic, and support scenarios such as uneven GPU tensor parallelism.
We could build an ecosystem of model architecture plugins - which could accelerate new model support by a lot without risking existing functionality.
Proposed Change.
Supporting this in it's basic form is simple as we just have to add loaded plugins to the ModelRegistry
.
To support more complex model architectures (Such in the #5367 case), we should decouple the Config
class which provides the amount of attention heads from vLLM's core logic, and allow each model architecture to override these values.
Feedback Period.
No response
CC List.
Any Other Things.
Just to make it clear, I'll be happy to implement this, but I want hear some feedback before I go ahead and implement this.