[Model] Support Dots OCR #24645

ywang96 · 2025-09-11T08:19:41Z

Purpose

This PR adds support for rednote-hilab/dots.ocr. This model is currently supported via OOT registration but we might as well bring it into vLLM so that users don't need to set it up with additonal steps.

Most of the codes taken are from the model repo, but this PR also cleans up a few logics that are no longer needed since this model does not support video modality.

FIXES #24581

Co-authored-by @yinz-aizip

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roger Wang <[email protected]>

casper-hansen · 2025-09-13T06:39:01Z

Thanks for taking a stab at upstreaming this @ywang96. We need more and better OCR models, and this would be a great step forward.

ywang96 · 2025-09-13T07:11:22Z

Thanks for taking a stab at upstreaming this @ywang96. We need more and better OCR models, and this would be a great step forward.

@casper-hansen No problem. BTW there were still some performance issues I still need to debug, but correctness-wise, this branch should be actually ready to go with vllm serve rednote-hilab/dots.ocr --trust-remote-code.

Signed-off-by: Roger Wang <[email protected]>

ywang96 · 2025-09-17T10:14:20Z

@yinz-aizip I made some changes to use vllm internal layers - could you help verify the correctness of this implementation (similarly to what you did in your PR)? Thanks!

yinz-aizip · 2025-09-18T03:33:28Z

Summary

This PR compares performance and evaluation metrics between two commits:
• Commit 96cc9bc
• Commit c830194

The model was served with the following configuration:

hf_model_path='/path/to/dots.ocr'
export CUDA_VISIBLE_DEVICES=4,5,6,7
vllm serve $hf_model_path \
    --host 127.0.0.1 \
    --port 8126 \
    --data-parallel-size 4 \
    --tensor-parallel-size 1 \
    --gpu-memory-utilization 0.8 \
    --chat-template-content-format string \
    --served-model-name model \
    --trust-remote-code

Efficiency

Throughput was estimated by running 1,000 concurrent requests on a single image.
• Commit 96cc9bc... → 3980.84
• Commit c830194... → 3743.34

The newer commit is slightly slower, though the difference is relatively minor.

Effectiveness

Evaluated on OmniDocBenchmark (lower is better):
• Commit 96cc9bc...
• overall_EN: 0.12578
• overall_CH: 0.16556
• Commit c830194...
• overall_EN: 0.12412
• overall_CH: 0.16285

The newer commit shows a small improvement in accuracy across both English and Chinese benchmarks.

Conclusion

•	Efficiency: Slightly reduced in the newer commit.
•	Effectiveness: Marginally improved results on OmniDocBenchmark.

Overall, the trade-off seems acceptable, with minor throughput loss balanced by better benchmark performance.

casper-hansen · 2025-09-18T06:24:03Z

@ywang96 I tested this PR:

Correctness issues fixed (no more endless generations)
Performance issues also fixed, now scales with number of concurrent images.

1x H100 concurrency benchmark:

1.81 images/second for 16 images.
2.36 images/second for 30 images.

casper-hansen · 2025-09-18T06:55:06Z

One potential performance issue: When I pass in 30 images, I see the below message 30 times before images are actually processed. This seems to add a big latency for this model.

Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7869.24it/s]

ywang96 · 2025-09-18T07:14:41Z

One potential performance issue: When I pass in 30 images, I see the below message 30 times before images are actually processed. This seems to add a big latency for this model.

Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7869.24it/s]

Hmm - FYI @DarkLight1337, I'm wondering if this has something to do with a out-of-tree processor (i.e one with trust_remote_code=True) inheriting from a HF one?

jeejeelee · 2025-09-18T07:42:19Z

vllm/model_executor/models/registry.py

    "ChameleonForConditionalGeneration": ("chameleon", "ChameleonForConditionalGeneration"),  # noqa: E501
    "Cohere2VisionForConditionalGeneration": ("cohere2_vision", "Cohere2VisionForConditionalGeneration"),  # noqa: E501
    "DeepseekVLV2ForCausalLM": ("deepseek_vl2", "DeepseekVLV2ForCausalLM"),
+    "DotsOCRForCausalLM": ("dots_ocr", "DotsOCRForCausalLM"),


We aslo need to add this model in https://github.com/vllm-project/vllm/blob/main/tests/models/registry.py

jeejeelee · 2025-09-18T07:46:33Z

vllm/model_executor/models/dots_ocr.py

+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x1, _ = self.fc1(x)
+        x3, _ = self.fc3(x)
+        x = F.silu(x1) * x3


Can we use MergedColumnParallelLinear here ?

vllm/model_executor/models/dots_ocr.py

jeejeelee · 2025-09-18T08:03:21Z

vllm/model_executor/models/dots_ocr.py

+            num_heads, self.tp_size)
+
+        # qkv/proj follow Qwen2-VL style; bias controlled by arg
+        self.qkv = ColumnParallelLinear(input_size=dim,


Maybe we can use QKVParallelLinear.

jeejeelee · 2025-09-18T15:29:46Z

One potential performance issue: When I pass in 30 images, I see the below message 30 times before images are actually processed. This seems to add a big latency for this model.
Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7869.24it/s]
Hmm - FYI @DarkLight1337, I'm wondering if this has something to do with a out-of-tree processor (i.e one with trust_remote_code=True) inheriting from a HF one?

I tested it with vision_language.py and didn't see similar output, which is a bit strange.

casper-hansen · 2025-09-18T17:38:04Z

@ywang96 @yinz-aizip is it possible to avoid --trust-remote-code? I believe this is the root cause of the high latency.

ywang96 · 2025-09-18T17:44:01Z

I believe this is the root cause of the high latency.

@casper-hansen Yea I think so too - but I don't think it should be removed but instead we should cache the object that fetches the remote file (instead of doing it over and over) - This should not happen and I need to debug why this is happening 😅

vllm/model_executor/models/dots_ocr.py

Signed-off-by: Roger Wang <[email protected]>

DarkLight1337 · 2025-09-21T10:15:56Z

The repeated loading issue should be fixed by #25341

Signed-off-by: Roger Wang <[email protected]>

ywang96 · 2025-09-21T11:17:03Z

The correctness of this PR has been verified by our contact from rednote engineering team so I'm just going to turn on ready label for it.

jeejeelee

Overall LGTM, some improvements can be completed in subsequent PRs

Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]>

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]> Signed-off-by: charlifu <[email protected]>

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]> Signed-off-by: yewentao256 <[email protected]>

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]>

add dotsocr

eda5a52

Signed-off-by: Roger Wang <[email protected]>

mergify bot added documentation Improvements or additions to documentation new-model Requests to new models labels Sep 11, 2025

fix

5a17c42

Signed-off-by: Roger Wang <[email protected]>

yinz-aizip mentioned this pull request Sep 14, 2025

Dotsocr #24824

Closed

5 tasks

yinz-aizip added 2 commits September 16, 2025 07:29

vllm MHA

96cc9bc

using vllm fa backend

d2f6ad2

yinz-aizip force-pushed the dots-ocr branch from c9572ff to d2f6ad2 Compare September 16, 2025 14:30

ywang96 added 5 commits September 17, 2025 01:10

Merge branch 'main' into dots-ocr

f98955f

add to example

cc29e44

Signed-off-by: Roger Wang <[email protected]>

swap to vllm internals

c446967

Signed-off-by: Roger Wang <[email protected]>

fix

4277d38

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into dots-ocr

c830194

ywang96 marked this pull request as ready for review September 17, 2025 10:21

ywang96 requested a review from hmellor as a code owner September 17, 2025 10:21

jeejeelee reviewed Sep 18, 2025

View reviewed changes

vllm/model_executor/models/dots_ocr.py Show resolved Hide resolved

jeejeelee reviewed Sep 18, 2025

View reviewed changes

yinz-aizip added 2 commits September 18, 2025 08:43

add test model registry

abab2db

update QKVParallelLinear

1219a9a

yinz-aizip requested a review from DarkLight1337 as a code owner September 18, 2025 16:21

yinz-aizip added 2 commits September 18, 2025 09:45

update MergedColumnParallelLinear

b72ca12

update test registry trust_remote_code

1ffd38f

jeejeelee reviewed Sep 19, 2025

View reviewed changes

vllm/model_executor/models/dots_ocr.py Outdated Show resolved Hide resolved

jeejeelee reviewed Sep 19, 2025

View reviewed changes

vllm/model_executor/models/dots_ocr.py Outdated Show resolved Hide resolved

jeejeelee reviewed Sep 19, 2025

View reviewed changes

vllm/model_executor/models/dots_ocr.py Outdated Show resolved Hide resolved

ywang96 added 2 commits September 19, 2025 01:48

Merge branch 'main' into dots-ocr

5d4ed75

hardcode image token

dee3130

Signed-off-by: Roger Wang <[email protected]>

DarkLight1337 mentioned this pull request Sep 21, 2025

[Optimization] Cache chat template result when processor fails to be loaded #25341

Merged

5 tasks

ywang96 added 3 commits September 21, 2025 03:45

update q k concat

67dca88

Signed-off-by: Roger Wang <[email protected]>

cleanup

3a09a77

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into dots-ocr

d9bfece

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 21, 2025

jeejeelee approved these changes Sep 21, 2025

View reviewed changes

ywang96 added 2 commits September 21, 2025 12:39

ignore video

254ee8d

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into dots-ocr

e871491

ywang96 enabled auto-merge (squash) September 21, 2025 19:40

fix

aa4b6da

Signed-off-by: Roger Wang <[email protected]>

ywang96 merged commit 7b57a43 into vllm-project:main Sep 22, 2025
49 checks passed

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Sep 22, 2025

[Model] Support Dots OCR (vllm-project#24645)

1ffb412

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Model] Support Dots OCR (vllm-project#24645)

9e31cc6

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

[Model] Support Dots OCR (vllm-project#24645)

5804278

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]> Signed-off-by: charlifu <[email protected]>

ywang96 mentioned this pull request Oct 1, 2025

Update vLLM deployment guide rednote-hilab/dots.ocr#222

Merged

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Model] Support Dots OCR (#24645)

5322390

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]> Signed-off-by: yewentao256 <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Model] Support Dots OCR (vllm-project#24645)

8eec3b2

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[Model] Support Dots OCR (vllm-project#24645)

94c184f

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]>

Uh oh!

[Model] Support Dots OCR #24645

[Model] Support Dots OCR #24645

Uh oh!

Conversation

ywang96 commented Sep 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

casper-hansen commented Sep 13, 2025

Uh oh!

ywang96 commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywang96 commented Sep 17, 2025

Uh oh!

yinz-aizip commented Sep 18, 2025

Summary

Efficiency

Effectiveness

Conclusion

Uh oh!

casper-hansen commented Sep 18, 2025

Uh oh!

casper-hansen commented Sep 18, 2025

Uh oh!

ywang96 commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeejeelee Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeejeelee Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee commented Sep 18, 2025

Uh oh!

casper-hansen commented Sep 18, 2025

Uh oh!

ywang96 commented Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Sep 21, 2025

Uh oh!

ywang96 commented Sep 21, 2025

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ywang96 commented Sep 11, 2025 •

edited by github-actions bot

Loading

ywang96 commented Sep 13, 2025 •

edited

Loading

ywang96 commented Sep 18, 2025 •

edited

Loading