-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Closed
Labels
feature requestNew feature or requestNew feature or request
Description
🚀 The feature, motivation and pitch
Hi folks,
While vLLM does support running inference with LoRA adapters for Mixtral, it seems like it only does so for layers k, v, q, and o. It would be great to have LoRA inference support for linear layers (w1, w2, w3, gate) as well. That would bring inference to parity with training support - https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/examples/mistral/mixtral.yml
Alternatives
No response
Additional context
No response
corbt and jorge-tromero
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request