Skip to content

Commit d0a1364

Browse files
nvjullinmgoin
authored andcommitted
[BugFix] Make FlashInferMetadataBuilder non-blocking (#25040)
Signed-off-by: Julien Lin <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: yewentao256 <[email protected]>
1 parent 2c3ba73 commit d0a1364

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

vllm/v1/attention/backends/flashinfer.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -585,9 +585,10 @@ def build(self,
585585
kv_data_type=self.kv_cache_dtype,
586586
)
587587
else:
588-
attn_metadata.qo_indptr_gpu = qo_indptr_cpu.to(self.device)
588+
attn_metadata.qo_indptr_gpu = qo_indptr_cpu.to(
589+
self.device, non_blocking=True)
589590
attn_metadata.paged_kv_indptr_gpu = paged_kv_indptr_cpu.to(
590-
self.device)
591+
self.device, non_blocking=True)
591592

592593
if num_decodes > 0:
593594
pure_decode = num_prefills == 0

0 commit comments

Comments
 (0)