Adds vLLM as Option for Local App #693

EliMCosta · 2024-05-21T21:20:52Z

Adds vLLM as option for "Local apps" in Hugginface

packages/tasks/src/local-apps.ts

julien-c · 2024-05-22T13:38:26Z

packages/tasks/src/local-apps.ts

+    api_key="token-abc123",
+)
+completion = client.chat.completions.create(
+  model="mistralai/Mistral-7B-Instruct-v0.1",


julien-c · 2024-05-22T13:38:36Z

packages/tasks/src/local-apps.ts

+		prettyLabel: "vLLM",
+		docsUrl: "https://docs.vllm.ai",
+		mainTask: "text-generation",
+		displayOnModelPage: isGptqModel && isAwqModel,


how would you define those methods?

how would you define those methods?

In fact, the suggested vLLM method deploys the non-quantized version from the Hugginface repository. All examples of type "text-generation" in the code are GGUF. Any suggestion?

Thank you for the PR! Concretely we support a set of architectures which is readable from the model data

huggingface.js/packages/tasks/src/model-data.ts

Line 40 in 04a0eb4

architectures?: string[];

https://github.com/vllm-project/vllm/blob/757b62c49560baa6f294310a53032348a0d95939/vllm/model_executor/models/__init__.py#L13-L63

And for quantization method we can read in config.quantization_config.quant_method which we support awq, gptq, aqlm, and marlin

https://huggingface.co/TheBloke/zephyr-7B-alpha-AWQ/blob/main/config.json#L28

awesome @simon-mo, super clear!

i've pushed 2123430 on this PR to type config.quantization_config.quant_method which we now parse & pass from the Hub

i've pushed 2123430 on this PR to type config.quantization_config.quant_method which we now parse & pass from the Hub

I made some changes, I need your help to review

Co-authored-by: Julien Chaumond <[email protected]>

fix dynamic model ids

Adds functions in order to check types of quantization

krampstudio · 2024-05-23T08:33:57Z

@EliMCosta do you have the vLLM icon as a square SVG?

EliMCosta · 2024-05-23T10:02:51Z

@EliMCosta do you have the vLLM icon as a square SVG?

packages/tasks/src/local-apps.ts

Co-authored-by: Bertrand CHEVRIER <[email protected]>

julien-c · 2024-06-03T14:03:43Z

this is mostly ready to merge no? wdyt?

krampstudio · 2024-06-04T07:48:52Z

packages/tasks/src/local-apps.ts

+function isFullModel(model: ModelData): boolean {
+    // Assuming a full model is identified by not having a quant_method
+    return !model.config?.quantization_config?.quant_method;


isFullModel creates a lot of false positive.

Instead we can maybe check against the supported architectures as suggested by @simon-mo

Something like this

const VLLM_SUPPORTED_ARCHS = [ "AquilaForCausalLM", "ArcticForCausalLM", "BaiChuanForCausalLM", "BloomForCausalLM", ... ]; model.config?.architectures?.some((arch) => VLLM_SUPPORTED_ARCHS.includes(arch)

cc @julien-c

You can query the vLLM package for this list once you have it installed:

>>> from vllm import ModelRegistry >>> ModelRegistry.get_supported_archs() ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'OlmoForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'QWenLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'ArcticForCausalLM', 'XverseForCausalLM', 'Phi3SmallForCausalLM', 'MistralModel']

youkaichao · 2024-07-12T19:07:55Z

any progress here? I think we can add a tag vLLM, so that when a model is supported by vLLM, it can have a vLLM tag, like https://huggingface.co/meta-llama/Llama-2-7b-hf .

krampstudio · 2024-09-02T15:00:46Z

@simon-mo @EliMCosta @mgoin we're back on the integration. I've merged it with #744

Let me know if the snippets are ok for you.
It renders like this:

youkaichao · 2024-09-03T00:13:45Z

@krampstudio I would recommend using the new cli vllm serve instead of the long python -m vllm.entrypoints.xxx

krampstudio · 2024-09-03T12:27:39Z

@youkaichao I've updated the snippets (and the screenshots in the comment above)

julien-c · 2024-09-03T17:42:12Z

packages/tasks/src/local-apps.ts

-function isGgufModel(model: ModelData) {
-	return model.tags.includes("gguf");
+function isGgufModel(model: ModelData): boolean {
+	return model.config?.quantization_config?.quant_method === "gguf" || model.tags.includes("gguf");


are we sure about the first part?

no, let's rely only on the tag. see d63b7cb

julien-c

looks in good shape!

packages/tasks/src/local-apps.ts

Co-authored-by: Michael Goin <[email protected]>

pcuenca

Looks good!

Follow up to #693 1. Add missing comma 2. Add line splitter between two commands <img width="600" alt="Screenshot 2024-10-04 at 14 52 09" src="https://github.com/user-attachments/assets/99a30553-4081-47fe-b89b-ca0c2a7d9fca">

Follow up to #693 There was trailing space in vlmm snippet which made the snippet unusable. When you try the snippet, vllm server would respond with error: ``` mishig@machine:~$ # Call the server using curl: curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Llama-3.2-3B-Instruct", "messages": [ {"role": "user", "content": "Hello!"} ] }' {"object":"error","message":"[{'type': 'missing', 'loc': ('body',), 'msg': 'Field required', 'input': None}]","type":"BadRequestError","param":null,"code":400}curl: (3) URL using bad/illegal format or missing URL -H: command not found --data: command not found mishig@machine:~$ ``` Explanation: trailing space was breaking the escaping

Update local-apps.ts

7dbd31d

Adds vLLM as option for "Local apps" in Hugginface

EliMCosta requested review from osanseviero, SBrandeis, gary149, Wauplin and julien-c as code owners May 21, 2024 21:20

EliMCosta mentioned this pull request May 21, 2024

Add Ollama and vLLM as Options for Local Apps in Hugginface #688

Closed

EliMCosta changed the title ~~Update local-apps.ts~~ Adds vLLM as Option for Local Apps in Hugginface May 21, 2024

EliMCosta changed the title ~~Adds vLLM as Option for Local Apps in Hugginface~~ Adds vLLM as Option for Local App May 21, 2024

julien-c reviewed May 22, 2024

View reviewed changes

EliMCosta and others added 5 commits May 22, 2024 11:14

Update packages/tasks/src/local-apps.ts

32560d8

Co-authored-by: Julien Chaumond <[email protected]>

Update packages/tasks/src/local-apps.ts

f97747a

Co-authored-by: Julien Chaumond <[email protected]>

Update local-apps.ts

82233de

fix dynamic model ids

Validation for config.quantization_config.quant_method

2123430

Update local-apps.ts

43de0e9

Adds functions in order to check types of quantization

krampstudio reviewed May 30, 2024

View reviewed changes

packages/tasks/src/local-apps.ts Outdated Show resolved Hide resolved

Update packages/tasks/src/local-apps.ts

8557110

Co-authored-by: Bertrand CHEVRIER <[email protected]>

krampstudio reviewed Jun 4, 2024

View reviewed changes

Vaibhavs10 mentioned this pull request Aug 13, 2024

Opening an issue to track apps for the next release of LocalApps #848

Open

Merge branch 'main' into patch-4

2c3c6c2

krampstudio requested a review from pcuenca as a code owner September 2, 2024 14:58

fix: udpate snippets

17ad182

julien-c reviewed Sep 3, 2024

View reviewed changes

mgoin reviewed Sep 3, 2024

View reviewed changes

packages/tasks/src/local-apps.ts Outdated Show resolved Hide resolved

krampstudio and others added 3 commits September 4, 2024 10:56

Update packages/tasks/src/local-apps.ts

2bb1bc1

Co-authored-by: Michael Goin <[email protected]>

fix: rely only on gguf tag

d63b7cb

Merge branch 'main' into patch-4

6fd56ef

pcuenca reviewed Sep 4, 2024

View reviewed changes

Merge branch 'main' into patch-4

0308303

krampstudio merged commit b15ff77 into huggingface:main Sep 4, 2024
4 checks passed

mishig25 mentioned this pull request Oct 4, 2024

[vLLM Snippet] update #942

Merged

mishig25 mentioned this pull request Oct 7, 2024

[vLLM Snippet] Fix escaping #952

Merged

EliMCosta deleted the patch-4 branch October 22, 2024 15:48

julien-c mentioned this pull request Feb 5, 2025

Add vLLM to local-apps #744

Closed

Adds vLLM as Option for Local App #693

Adds vLLM as Option for Local App #693

Uh oh!

Conversation

EliMCosta commented May 21, 2024

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

julien-c May 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krampstudio commented May 23, 2024

Uh oh!

EliMCosta commented May 23, 2024

Uh oh!

Uh oh!

julien-c commented Jun 3, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

youkaichao commented Jul 12, 2024

Uh oh!

krampstudio commented Sep 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youkaichao commented Sep 3, 2024

Uh oh!

krampstudio commented Sep 3, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krampstudio Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

julien-c left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

julien-c May 22, 2024 •

edited

Loading

krampstudio commented Sep 2, 2024 •

edited

Loading

krampstudio Sep 4, 2024 •

edited

Loading