Skip to content

Llama3.2 Vision Model: Guides and Issues #8826

@simon-mo

Description

@simon-mo

Running the server (using the vLLM CLI or our docker image):

  • vllm serve meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eager --max-num-seqs 16
  • vllm serve meta-llama/Llama-3.2-90B-Vision-Instruct --enforce-eager --max-num-seqs 32 --tensor-parallel-size 8

Currently:

  • Only one leading image is supported. Support for multiple images and interleaving images are work in progress.
  • Text only inference is supported.
  • Only NVIDIA GPUs are supported.
  • Performance is acceptable but to be optimized! We aim at first release to be functionality correct. We will work on making it fast 🏎️

Please see the next steps for better supporting this model on vLLM.

cc @heheda12345 @ywang96

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions