[Feature]: Supporting MultiModal inputs using Llama3.1

### 🚀 The feature, motivation and pitch

We have a deployment of Llama3.1-8B-Instruct and Llama3.1-70B-Instruct models through vLLM hosted in our OnPremise GPU infra.

While testing different use-cases, we realized that the current version of vLLM does not support MultiModal input for Llama3.1 as per this document: https://docs.vllm.ai/en/latest/models/supported_models.html#supported-vlms

Is it possible to enable llama3.1 as a VLM? Or if it can be enabled through any different route, is there any documentation or guide around it?

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Supporting MultiModal inputs using Llama3.1 #8146

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Supporting MultiModal inputs using Llama3.1 #8146

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions