-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Closed as not planned
Labels
staleOver 90 days of inactivityOver 90 days of inactivity
Description
Running the server (using the vLLM CLI or our docker image):
vllm serve meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eager --max-num-seqs 16
vllm serve meta-llama/Llama-3.2-90B-Vision-Instruct --enforce-eager --max-num-seqs 32 --tensor-parallel-size 8
Currently:
- Only one leading image is supported. Support for multiple images and interleaving images are work in progress.
- Text only inference is supported.
- Only NVIDIA GPUs are supported.
- Performance is acceptable but to be optimized! We aim at first release to be functionality correct. We will work on making it fast 🏎️
Please see the next steps for better supporting this model on vLLM.
stikkireddy, yilunzhao, Pedrexus, efwfe, Visual-Synthesizer and 10 more
Metadata
Metadata
Assignees
Labels
staleOver 90 days of inactivityOver 90 days of inactivity