Llama3.2 Vision Model: Guides and Issues

Running the server (using the vLLM CLI or our [docker image](https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html)):
* `vllm serve meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eager --max-num-seqs 16`
* `vllm serve meta-llama/Llama-3.2-90B-Vision-Instruct --enforce-eager --max-num-seqs 32 --tensor-parallel-size 8`

Currently:
* Only one leading image is supported. Support for multiple images and interleaving images are work in progress.
* Text only inference is supported.
* Only NVIDIA GPUs are supported.
* *Performance is acceptable but to be optimized!* We aim at first release to be functionality correct. We will work on making it fast 🏎️ 

**Please see the [next steps](https://github.com/vllm-project/vllm/issues/8826#issuecomment-2379960574) for better supporting this model on vLLM.**

cc @heheda12345 @ywang96 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Llama3.2 Vision Model: Guides and Issues #8826

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Llama3.2 Vision Model: Guides and Issues #8826

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions