-
Notifications
You must be signed in to change notification settings - Fork 386
Adds vLLM as Option for Local App #693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds vLLM as option for "Local apps" in Hugginface
packages/tasks/src/local-apps.ts
Outdated
api_key="token-abc123", | ||
) | ||
completion = client.chat.completions.create( | ||
model="mistralai/Mistral-7B-Instruct-v0.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
packages/tasks/src/local-apps.ts
Outdated
prettyLabel: "vLLM", | ||
docsUrl: "https://docs.vllm.ai", | ||
mainTask: "text-generation", | ||
displayOnModelPage: isGptqModel && isAwqModel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would you define those methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would you define those methods?
In fact, the suggested vLLM method deploys the non-quantized version from the Hugginface repository. All examples of type "text-generation" in the code are GGUF. Any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! Concretely we support a set of architectures
which is readable from the model data
architectures?: string[]; |
And for quantization method we can read in config.quantization_config.quant_method
which we support awq, gptq, aqlm, and marlin
https://huggingface.co/TheBloke/zephyr-7B-alpha-AWQ/blob/main/config.json#L28
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome @simon-mo, super clear!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've pushed 2123430 on this PR to type config.quantization_config.quant_method
which we now parse & pass from the Hub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've pushed 2123430 on this PR to type
config.quantization_config.quant_method
which we now parse & pass from the Hub
I made some changes, I need your help to review
Co-authored-by: Julien Chaumond <[email protected]>
Co-authored-by: Julien Chaumond <[email protected]>
fix dynamic model ids
Adds functions in order to check types of quantization
@EliMCosta do you have the vLLM icon as a square SVG? |
|
Co-authored-by: Bertrand CHEVRIER <[email protected]>
this is mostly ready to merge no? wdyt? |
packages/tasks/src/local-apps.ts
Outdated
function isFullModel(model: ModelData): boolean { | ||
// Assuming a full model is identified by not having a quant_method | ||
return !model.config?.quantization_config?.quant_method; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isFullModel
creates a lot of false positive.
Instead we can maybe check against the supported architectures as suggested by @simon-mo
Something like this
const VLLM_SUPPORTED_ARCHS = [
"AquilaForCausalLM", "ArcticForCausalLM", "BaiChuanForCausalLM", "BloomForCausalLM", ...
];
model.config?.architectures?.some((arch) => VLLM_SUPPORTED_ARCHS.includes(arch)
cc @julien-c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can query the vLLM package for this list once you have it installed:
>>> from vllm import ModelRegistry
>>> ModelRegistry.get_supported_archs()
['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'OlmoForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'QWenLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'ArcticForCausalLM', 'XverseForCausalLM', 'Phi3SmallForCausalLM', 'MistralModel']
any progress here? I think we can add a tag vLLM, so that when a model is supported by vLLM, it can have a vLLM tag, like https://huggingface.co/meta-llama/Llama-2-7b-hf . |
@simon-mo @EliMCosta @mgoin we're back on the integration. I've merged it with #744 Let me know if the snippets are ok for you. |
@krampstudio I would recommend using the new cli |
@youkaichao I've updated the snippets (and the screenshots in the comment above) |
packages/tasks/src/local-apps.ts
Outdated
function isGgufModel(model: ModelData) { | ||
return model.tags.includes("gguf"); | ||
function isGgufModel(model: ModelData): boolean { | ||
return model.config?.quantization_config?.quant_method === "gguf" || model.tags.includes("gguf"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure about the first part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, let's rely only on the tag. see d63b7cb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks in good shape!
Co-authored-by: Michael Goin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Follow up to #693 1. Add missing comma 2. Add line splitter between two commands <img width="600" alt="Screenshot 2024-10-04 at 14 52 09" src="https://github.com/user-attachments/assets/99a30553-4081-47fe-b89b-ca0c2a7d9fca">
Follow up to #693 There was trailing space in vlmm snippet which made the snippet unusable. When you try the snippet, vllm server would respond with error: ``` mishig@machine:~$ # Call the server using curl: curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Llama-3.2-3B-Instruct", "messages": [ {"role": "user", "content": "Hello!"} ] }' {"object":"error","message":"[{'type': 'missing', 'loc': ('body',), 'msg': 'Field required', 'input': None}]","type":"BadRequestError","param":null,"code":400}curl: (3) URL using bad/illegal format or missing URL -H: command not found --data: command not found mishig@machine:~$ ``` Explanation: trailing space was breaking the escaping
Adds vLLM as option for "Local apps" in Hugginface