[BugFix] Make FlashInferMetadataBuilder non-blocking #25040

nvjullin · 2025-09-17T06:01:28Z

Purpose

The blocking H2D memcpys breaks overlap scheduler #23569, setting them to non-blocking fixes it.
The correctness is ensured by vllm/v1/worker/gpu_model_runner.py:2112

            if self.prepare_inputs_event is not None:
                # Ensure prior step has finished with reused CPU tensors.
                self.prepare_inputs_event.synchronize()
            try:
                # Prepare the decoder inputs.
                (attn_metadata, logits_indices, spec_decode_metadata,
                 num_scheduled_tokens_np, spec_decode_common_attn_metadata,
                 max_query_len, ubatch_slices, num_tokens_after_padding
                 ) = self._prepare_inputs(scheduler_output)

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Julien Lin <[email protected]>

nvpohanh · 2025-09-17T09:19:59Z

@benchislett Please check if this fix is correct. Thanks!

mgoin

Looks reasonable to me, but @LucasWilkinson or @benchislett should validate before merge

LucasWilkinson

LGTM! Thanks for the contribution!

) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]>

) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: charlifu <[email protected]>

Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: yewentao256 <[email protected]>

) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]>

) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

set two H2D to non-blocking

e5a8be2

Signed-off-by: Julien Lin <[email protected]>

nvjullin changed the title ~~Make FlashInferMetadataBuilder non-blocking~~ [BugFix] Make FlashInferMetadataBuilder non-blocking Sep 17, 2025

mergify bot added the v1 label Sep 17, 2025

nvjullin marked this pull request as ready for review September 17, 2025 06:03

nvjullin requested a review from mgoin as a code owner September 17, 2025 06:03

nvjullin marked this pull request as draft September 17, 2025 06:48

nvjullin marked this pull request as ready for review September 17, 2025 08:32

benchislett requested a review from LucasWilkinson September 17, 2025 13:18

mgoin approved these changes Sep 18, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 18, 2025

LucasWilkinson approved these changes Sep 18, 2025

View reviewed changes

Merge branch 'main' into fix-bad-sync

21d17e5

mgoin enabled auto-merge (squash) September 19, 2025 18:21

mgoin merged commit b1a63d1 into vllm-project:main Sep 19, 2025
44 checks passed

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[BugFix] Make FlashInferMetadataBuilder non-blocking (vllm-project#25040

a3a7734

) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[BugFix] Make FlashInferMetadataBuilder non-blocking (#25040)

d0a1364

Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: yewentao256 <[email protected]>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[BugFix] Make FlashInferMetadataBuilder non-blocking (vllm-project#25040

f7e1ba4

) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[BugFix] Make FlashInferMetadataBuilder non-blocking (vllm-project#25040

9d6d33d

) Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] Make FlashInferMetadataBuilder non-blocking #25040

[BugFix] Make FlashInferMetadataBuilder non-blocking #25040

Uh oh!

nvjullin commented Sep 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

nvpohanh commented Sep 17, 2025

Uh oh!

mgoin left a comment

Uh oh!

LucasWilkinson left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[BugFix] Make FlashInferMetadataBuilder non-blocking #25040

[BugFix] Make FlashInferMetadataBuilder non-blocking #25040

Uh oh!

Conversation

nvjullin commented Sep 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

nvpohanh commented Sep 17, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nvjullin commented Sep 17, 2025 •

edited by github-actions bot

Loading