-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Description
🚀 The feature, motivation and pitch
If I am not wrong, currently vllm supports only the Language models not the Vision models.
NotImplementedError: Pipeline parallelism is only supported for the following architectures: ['AquilaModel', 'AquilaForCausalLM', 'DeepseekV2ForCausalLM', 'InternLMForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'Phi3ForCausalLM', 'GPT2LMHeadModel', 'MixtralForCausalLM', 'NemotronForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'QWenLMHeadModel'].
This feature would greatly benefit teams and projects working with vision-language models, allowing them to scale out their workloads efficiently and maintain performance as model sizes continue to grow.
Also It would be greatly helpful, if someone can point me out on other possibilities for pipeline parallelism. Thanks in advance
Alternatives
No response
Additional context
No response