Skip to content

Conversation

LinkerCodeMonkey
Copy link

We added code to support llava. #307

test code:

from vllm import MLLM, SamplingParams
prompts = [
    "what is doing the man",
    "what you name",
    "what can I do for you",
    "what is doing the man",
】
images = [{
    "src_type": "url",
    "image_src": "IMAGE_URL"}]*4

sampling_params = SamplingParams(temperature=0.8, top_p=0.5, max_tokens=1024)
model,tokenizer = "/PATH/LLaVA-13b-delta-v1-1", "/PATH/LLaVA-13b-delta-v1-1"
gpu_memory_utilization = 0.9
mllm = MLLM(model=model,tokenizer=tokenizer, gpu_memory_utilization=gpu_memory_utilization)
outputs = mllm.generate(prompts, images, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

@LinkerCodeMonkey LinkerCodeMonkey changed the title add llava support Add LLaVA support Aug 17, 2023
@zhuohan123 zhuohan123 added the new-model Requests to new models label Sep 12, 2023
@teraktor2006
Copy link

Thanks. Is it working with LLaVA1.5?

@hmellor
Copy link
Member

hmellor commented Mar 28, 2024

@LinkerCodeMonkey do you still plan to work on this PR?

@WoosukKwon
Copy link
Collaborator

Closed as we added support for LLaVA in #3042

@WoosukKwon WoosukKwon closed this Apr 12, 2024
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Feb 7, 2025
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Feb 20, 2025
Support mla (vllm-project#775)

[Deepseek r1] improve latenct cache by save last 64 in key cache and 512 in value cache (vllm-project#804)

Before, we can only allocate 1854 blocks with 29.2G, now we are able to
allocate 3156 blocks
Performance wise, not visible regression and able to push to higher
batch_size or longer context length

---------

Signed-off-by: Chendi Xue <[email protected]>
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Mar 14, 2025
Support mla (vllm-project#775)

[Deepseek r1] improve latenct cache by save last 64 in key cache and 512 in value cache (vllm-project#804)

Before, we can only allocate 1854 blocks with 29.2G, now we are able to
allocate 3156 blocks
Performance wise, not visible regression and able to push to higher
batch_size or longer context length

---------

Signed-off-by: Chendi Xue <[email protected]>
amy-why-3459 pushed a commit to amy-why-3459/vllm that referenced this pull request Sep 15, 2025
### What this PR does / why we need it?

This is a continuing work of vllm-project#716.
This PR add workflow to build and release wheel, and also release source
to PYPI.
We have 3 conditions to trigger the workflow:

1. PR to `main` and `*-dev`
2. push to `main` and `*-dev`
3. push tag with name of `v*`

Release to PYPI will only be done under condition 3. Under condition 1
and 2, it will generate .tar.gz and build .whl, upload to github
artifacts but will not release.

update:
Will build .whl and upload to github artifacts with scheduled task.


### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
All triggered conditions are well tested with my fork repo.

---------

Signed-off-by: Shuqiao Li <[email protected]>
Signed-off-by: Yikun Jiang <[email protected]>
Co-authored-by: Yikun Jiang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new-model Requests to new models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants