[Feature]: Linear adapter support for Mixtral

### 🚀 The feature, motivation and pitch

Hi folks, 

While vLLM does support running inference with LoRA adapters for Mixtral, it seems like it only does so for layers k, v, q, and o. It would be great to have LoRA inference support for linear layers (w1, w2, w3, gate) as well. That would bring inference to parity with training support - https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/examples/mistral/mixtral.yml

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Linear adapter support for Mixtral #5155

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Linear adapter support for Mixtral #5155

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions