[Model] Add Support for Ovis1.6-Gemma2-9B Model #11240

Player256 · 2024-12-16T21:15:16Z

This pull request addresses issue #9638 by adding support for the Ovis1.6-Gemma2-9B model.

github-actions · 2024-12-16T21:15:29Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Isotr0py

This model implementation is coupling the image processing and model forwarding...

You can refer to the model implementation in llava.py and phi3v.py when adding model implementation.

vllm/model_executor/models/ovis.py

Swipe4057 · 2025-01-15T14:32:09Z

any news?

Signed-off-by: Player256 <[email protected]>

Player256 · 2025-02-03T12:43:56Z

Hey @Isotr0py could you give this PR a review?

Isotr0py

Although the model implementation becomes better, there are still lots of things needed to be done:

Update the documentation to mention this supported model in docs/source/models/supported_models.md
Add example in examples/offline_inference/vision_language.py, if this model support multi-image inputs, please also update examples/offline_inference/vision_language_multi_image.py
Add model correctness tests in tests/models/decoder_only/vision_language/test_models.py and processor correctness test in tests/models/multimodal/processing/test_common.py
Update tests/models/registry.py with model information.

Isotr0py · 2025-02-04T06:52:35Z

vllm/model_executor/models/ovis.py

+    # def merge_multimodal(
+    #     self,
+    #     text_input_ids: torch.Tensor,
+    #     text_attention_masks: torch.Tensor,
+    #     text_labels: Optional[torch.Tensor],
+    #     pixel_values: List[Optional[torch.Tensor]],
+    #     left_padding: bool = False
+    # ):


Please remove this unused code.

vllm/model_executor/models/ovis.py

vllm/transformers_utils/configs/ovis.py

Isotr0py · 2025-02-04T07:12:02Z

Please address pre-commit linting errors as well.

Player256 · 2025-02-04T08:39:00Z

Please address pre-commit linting errors as well.

Thanks @Isotr0py for the review, I'll get back to it.

ismael-dm · 2025-02-24T13:10:18Z

will this PR cover also new Ovis 2 models? https://huggingface.co/collections/AIDC-AI/ovis2-67ab36c7e497429034874464

Signed-off-by: Player256 <[email protected]>

Player256 · 2025-02-27T09:07:31Z

I'll add the tests for it.

tests/models/registry.py

Signed-off-by: Isotr0py <[email protected]>

vllm/model_executor/models/ovis.py

Signed-off-by: Isotr0py <[email protected]>

Isotr0py

@Player256 I tried this PR, but it doesn't work. I managed to make the model loaded. But it seems that the multimodal processor implementation still can't work.

Isotr0py · 2025-02-28T09:16:20Z

vllm/model_executor/models/ovis.py

+       def get_replacement_ovis(image: PIL.Image.Image):
+           _, image_placeholders = self.preprocess_image(image)
+
+           return image_placeholders


Why do we re-process images here?

Isotr0py · 2025-02-28T09:21:47Z

vllm/model_executor/models/ovis.py

+    def get_image_size_with_most_features(self) -> ImageSize:
+        return ImageSize(height=384,width=384)


Seems that Ovis will use dynamic resize (https://huggingface.co/AIDC-AI/Ovis1.6-Llama3.2-3B/blob/b8d93d7468f47fd803eb26ec2c1bc2d7e5fba60e/modeling_ovis.py#L135-L159), does 384x384 image size really return most image _features from visual tokenizer?

Hey I referred to this paper where the authors fine-tuned ViT models with an input resolution of 384x384 for S/16 and B/16 models, while using 512x512 for L/16 models. This suggests that 384x384 would be an appropriate choice for SigLip feature extraction if you are using a similar model size (ViT-S or ViT-B).
2106.11297v4.pdf

I mean this model using dynamic preprocessing with aspect ratio, so pixel_values (num_patches, C, H, W) can have dynamic shape on patch dim., then causing different seq_length on placeholder.

For example, given a 2048x2048 image, the pixel_values has shape of (10, 3, 384, 384). The image size here should correspond to the longest placeholder.

vllm/model_executor/models/ovis.py

Signed-off-by: Player256 <[email protected]>

Player256 · 2025-03-02T18:53:44Z

@Isotr0py I am facing this issue in the OvisProcessor.

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/ubuntu/oracle/vllm/test.py", line 5, in <module>
[rank0]:     model = LLM(model=model_name,max_model_len=8192)
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/utils.py", line 1045, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/entrypoints/llm.py", line 243, in __init__
[rank0]:     self.llm_engine = self.engine_class.from_engine_args(
[rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/engine/llm_engine.py", line 494, in from_engine_args
[rank0]:     engine = cls(
[rank0]:              ^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/engine/llm_engine.py", line 277, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/engine/llm_engine.py", line 426, in _initialize_kv_caches
[rank0]:     self.model_executor.determine_num_available_blocks())
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/executor/executor_base.py", line 102, in determine_num_available_blocks
[rank0]:     results = self.collective_rpc("determine_num_available_blocks")
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[rank0]:     answer = run_method(self.driver_worker, method, args, kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/utils.py", line 2232, in run_method
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/worker/worker.py", line 229, in determine_num_available_blocks
[rank0]:     self.model_runner.profile_run()
[rank0]:   File "/opt/conda/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/worker/model_runner.py", line 1243, in profile_run
[rank0]:     self._dummy_run(max_num_batched_tokens, max_num_seqs)
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/worker/model_runner.py", line 1308, in _dummy_run
[rank0]:     .dummy_data_for_profiling(self.model_config,
[rank0]:      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/inputs/registry.py", line 336, in dummy_data_for_profiling
[rank0]:     dummy_data = profiler.get_dummy_data(
[rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/profiling.py", line 168, in get_dummy_data
[rank0]:     mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts)
[rank0]:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/profiling.py", line 141, in _get_dummy_mm_inputs
[rank0]:     return self.processor.apply(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1476, in apply
[rank0]:     ) = self._cached_apply_hf_processor(
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1268, in _cached_apply_hf_processor
[rank0]:     ) = self._apply_hf_processor_main(
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1209, in _apply_hf_processor_main
[rank0]:     prompt_ids = self._apply_hf_processor_text_only(prompt)
[rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1132, in _apply_hf_processor_text_only
[rank0]:     prompt_ids, _, _ = self._apply_hf_processor_text_mm(
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1102, in _apply_hf_processor_text_mm
[rank0]:     processed_data = self._call_hf_processor(
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/model_executor/models/ovis.py", line 378, in _call_hf_processor
[rank0]:     return super()._call_hf_processor(prompt=prompt,
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1065, in _call_hf_processor
[rank0]:     return self.info.ctx.call_hf_processor(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/inputs/registry.py", line 172, in call_hf_processor
[rank0]:     raise RuntimeError(msg) from exc
[rank0]: RuntimeError: Failed to apply OvisProcessor on data={'text': '<image>'} with kwargs={}

Somehow the <image> token is not handled properly during the profiling phase of vLLM. Can you point me into the right direction how is multimodal processing done in vLLM? Because I have tried to pass input_ids with image_placeholder token ids and pixel values which is outputted by the processor. I dont know exactly where that goes.

mergify · 2025-03-03T02:55:12Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Player256.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Isotr0py · 2025-03-03T12:30:39Z

Somehow the token is not handled properly during the profiling phase of vLLM. Can you point me into the right direction how is multimodal processing done in vLLM? Because I have tried to pass input_ids with image_placeholder token ids and pixel values which is outputted by the processor. I dont know exactly where that goes.

I thought you need to implement the text-only processing for OvisProcessor, because text and image will be fed to the processor separately in some cases. (IIRC, the original Ovis Processor doesn't support text-only imputs)

mergify · 2025-04-06T03:09:33Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Player256.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

DarkLight1337 · 2025-05-10T16:01:23Z

Closing as superseded by #17861

Draft PR

c3242bc

DarkLight1337 mentioned this pull request Dec 17, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

53 tasks

Isotr0py self-assigned this Dec 18, 2024

Player256 and others added 3 commits January 4, 2025 18:56

Merge branch 'vllm-project:main' into ovis

d02fcb9

modified the model code

1c0880a

Merge branch 'ovis' of https://github.com/Player256/vllm into ovis

6bf22e3

Player256 marked this pull request as ready for review January 4, 2025 14:02

Isotr0py reviewed Jan 4, 2025

View reviewed changes

vllm/model_executor/models/ovis.py Outdated Show resolved Hide resolved

Player256 added 2 commits January 7, 2025 23:01

Merge branch 'vllm-project:main' into ovis

0ff4eb4

Merge branch 'vllm-project:main' into ovis

dc2f81f

Player256 and others added 3 commits January 23, 2025 05:25

Merge branch 'vllm-project:main' into ovis

91f2bc4

Decoupled the processing multimodal data from forward

939233f

Signed-off-by: Player256 <[email protected]>

Merge branch 'ovis' of https://github.com/Player256/vllm into ovis

058d179

Isotr0py reviewed Feb 4, 2025

View reviewed changes

MrMegnis mentioned this pull request Feb 7, 2025

Create a set of open VLMs for the service and make a map of their use aimclub/Edulytica#96

Closed

Player256 and others added 2 commits February 26, 2025 02:28

Merge branch 'vllm-project:main' into ovis

eb18d75

Created OvisProcessor

2e0ebf3

Signed-off-by: Player256 <[email protected]>

Player256 requested review from DarkLight1337 and ywang96 as code owners February 27, 2025 09:05

mergify bot added the documentation Improvements or additions to documentation label Feb 27, 2025

Player256 marked this pull request as draft February 27, 2025 09:06

Merge branch 'vllm-project:main' into ovis

ecb056b

Isotr0py reviewed Feb 27, 2025

View reviewed changes

tests/models/registry.py Outdated Show resolved Hide resolved

fix registry

26071a1

Signed-off-by: Isotr0py <[email protected]>

DarkLight1337 reviewed Feb 28, 2025

View reviewed changes

vllm/model_executor/models/ovis.py Outdated Show resolved Hide resolved

Isotr0py added 5 commits February 28, 2025 15:50

ooops

e6b7c97

Signed-off-by: Isotr0py <[email protected]>

fix

3824472

Signed-off-by: Isotr0py <[email protected]>

make ovis can be initialized

7b07751

Signed-off-by: Isotr0py <[email protected]>

make ovis can be loaded

328273a

Signed-off-by: Isotr0py <[email protected]>

clean up methods

f2eca81

Signed-off-by: Isotr0py <[email protected]>

Isotr0py reviewed Feb 28, 2025

View reviewed changes

Player256 and others added 3 commits February 28, 2025 17:04

Merge branch 'vllm-project:main' into ovis

76cdb0a

Merge branch 'vllm-project:main' into ovis

908e02c

fix

59ee998

Signed-off-by: Player256 <[email protected]>

mergify bot added the needs-rebase label Mar 3, 2025

Merge branch 'main' into ovis

0e96f0c

mergify bot removed the needs-rebase label Mar 3, 2025

DarkLight1337 mentioned this pull request Mar 6, 2025

[New Model]: Ovis2 #13251

Closed

1 task

Merge branch 'vllm-project:main' into ovis

48aac69

DarkLight1337 mentioned this pull request Mar 16, 2025

[Tracking Issue]: Multi-modal model requests #14876

Closed

mergify bot added the needs-rebase label Apr 6, 2025

DarkLight1337 closed this May 10, 2025

		def get_image_size_with_most_features(self) -> ImageSize:
		return ImageSize(height=384,width=384)

Uh oh!

[Model] Add Support for Ovis1.6-Gemma2-9B Model #11240

[Model] Add Support for Ovis1.6-Gemma2-9B Model #11240

Uh oh!

Conversation

Player256 commented Dec 16, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 16, 2024

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Swipe4057 commented Jan 15, 2025

Uh oh!

Player256 commented Feb 3, 2025

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Isotr0py commented Feb 4, 2025

Uh oh!

Player256 commented Feb 4, 2025

Uh oh!

ismael-dm commented Feb 24, 2025

Uh oh!

Player256 commented Feb 27, 2025

Uh oh!

Uh oh!

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

Player256 Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Player256 commented Mar 2, 2025 • edited by Isotr0py Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Mar 3, 2025

Uh oh!

Isotr0py commented Mar 3, 2025

Uh oh!

mergify bot commented Apr 6, 2025

Uh oh!

DarkLight1337 commented May 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Player256 commented Dec 16, 2024 •

edited by github-actions bot

Loading

Player256 commented Mar 2, 2025 •

edited by Isotr0py

Loading