[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qwen 1m models. #21364

sighingnow · 2025-07-22T09:27:06Z

…en 1m models.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Fixes a bug in the DCA backend. The error was introduced during rebasing previous PR: #11844

…en 1m models. Signed-off-by: Tao He <[email protected]>

gemini-code-assist

Code Review

This pull request addresses a bug in the dual-chunk-flash-attention backend for Qwen 1M models. The changes involve removing an erroneous block_table parameter from several function calls within vllm/attention/backends/dual_chunk_flash_attn.py. My review confirms that this parameter was being passed incorrectly to functions that expect contiguous tensors, and its removal is the correct fix. The changes are consistent and well-contained, resolving the bug introduced during a previous rebase. The code quality is good.

noooop · 2025-07-22T09:45:53Z

@sighingnow

Please help verify whether using VLLM_ALLOW_LONG_MAX_MODEL_LEN in examples/offline_inference/qwen_1m.py may potentially cause issues leading to nan.

Refer to #20904

…en 1m models.

I didn't even know what kind of machine could run it
And I'm not familiar with the dual-chunk-flash-attention backend either.

Nan can cause the detokener to output !!!!!!!!!!!!!!
very easy to verify.

github-actions · 2025-07-22T10:01:37Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

sighingnow · 2025-07-23T11:13:38Z

@sighingnow

Please help verify whether using VLLM_ALLOW_LONG_MAX_MODEL_LEN in examples/offline_inference/qwen_1m.py may potentially cause issues leading to nan.

Refer to #20904

…en 1m models.

I didn't even know what kind of machine could run it And I'm not familiar with the dual-chunk-flash-attention backend either.

Nan can cause the detokener to output !!!!!!!!!!!!!! very easy to verify.

This issue should be orthogonal with this PR. I will take a look, but could we merge this PR first to make qwen-1m work?

youkaichao · 2025-07-23T11:40:38Z

cc @LucasWilkinson

youkaichao

LGTM, approving since @sighingnow maintains the dual-chunk attention thing and this change looks reasonable.

sighingnow · 2025-07-23T12:49:09Z

LGTM, approving since @sighingnow maintains the dual-chunk attention thing and this change looks reasonable.

Thanks! The failed test looks not caused by this PR.

…en 1m models. (vllm-project#21364) Signed-off-by: Tao He <[email protected]> Signed-off-by: qizixi <[email protected]>

…en 1m models. (vllm-project#21364) Signed-off-by: Tao He <[email protected]> Signed-off-by: x22x22 <[email protected]>

…en 1m models. (vllm-project#21364) Signed-off-by: Tao He <[email protected]>

…en 1m models. (vllm-project#21364) Signed-off-by: Tao He <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

…en 1m models. (vllm-project#21364) Signed-off-by: Tao He <[email protected]> Signed-off-by: Paul Pak <[email protected]>

…en 1m models. (vllm-project#21364) Signed-off-by: Tao He <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

…en 1m models. (vllm-project#21364) Signed-off-by: Tao He <[email protected]>

[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qw…

b1be33a

…en 1m models. Signed-off-by: Tao He <[email protected]>

mergify bot added the qwen Related to Qwen models label Jul 22, 2025

gemini-code-assist bot reviewed Jul 22, 2025

View reviewed changes

youkaichao approved these changes Jul 23, 2025

View reviewed changes

youkaichao added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 23, 2025

simon-mo merged commit 7c734ee into vllm-project:main Jul 23, 2025
66 of 68 checks passed

ExtReMLapin mentioned this pull request Jul 23, 2025

[Bugfix]: Fix DualChunkFlashAttention for short sequences #19084

Closed

zixi-qi pushed a commit to zixi-qi/vllm that referenced this pull request Jul 23, 2025

[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qw…

a1fb3aa

…en 1m models. (vllm-project#21364) Signed-off-by: Tao He <[email protected]> Signed-off-by: qizixi <[email protected]>

x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025

[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qw…

d3b738b

…en 1m models. (vllm-project#21364) Signed-off-by: Tao He <[email protected]> Signed-off-by: x22x22 <[email protected]>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qw…

a710a4d

…en 1m models. (vllm-project#21364) Signed-off-by: Tao He <[email protected]>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qw…

ea1cbe2

…en 1m models. (vllm-project#21364) Signed-off-by: Tao He <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qw…

78306cf

…en 1m models. (vllm-project#21364) Signed-off-by: Tao He <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qwen 1m models. #21364

[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qwen 1m models. #21364

Uh oh!

sighingnow commented Jul 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

noooop commented Jul 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 22, 2025

Uh oh!

sighingnow commented Jul 23, 2025

Uh oh!

youkaichao commented Jul 23, 2025

Uh oh!

youkaichao left a comment

Uh oh!

sighingnow commented Jul 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qwen 1m models. #21364

[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qwen 1m models. #21364

Uh oh!

Conversation

sighingnow commented Jul 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

noooop commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 22, 2025

Uh oh!

sighingnow commented Jul 23, 2025

Uh oh!

youkaichao commented Jul 23, 2025

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

sighingnow commented Jul 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sighingnow commented Jul 22, 2025 •

edited by github-actions bot

Loading

noooop commented Jul 22, 2025 •

edited

Loading