Skip to content

Conversation

@qjia7
Copy link
Contributor

@qjia7 qjia7 commented Sep 30, 2025

This pull request introduces support for indirect dispatch in the WebGPU FlashAttention implementation, enabling more dynamic and efficient kernel launches based on runtime sequence lengths. The changes add new logic and parameters to propagate sequence length information and indirect dispatch buffers through the attention pipeline, with conditional code paths to maintain compatibility with the existing direct dispatch approach.

It's part of the work to enable graph capture in phi4 #25868

@qjia7 qjia7 marked this pull request as ready for review September 30, 2025 12:27
@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Oct 1, 2025
Copy link
Contributor Author

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments @sushraja-msft. My reply inserted. I am going to merge this PR to unblock my following work as we discussed offline. @sushraja-msft @fs-eire Please continue your review if you have more comments. I will follow up them separately.

@qjia7 qjia7 merged commit cd4ac49 into main Oct 14, 2025
92 checks passed
@qjia7 qjia7 deleted the fa_indirect_dispatch branch October 14, 2025 07:02
fs-eire pushed a commit that referenced this pull request Oct 24, 2025
This pull request introduces support for indirect dispatch in the WebGPU
FlashAttention implementation, enabling more dynamic and efficient
kernel launches based on runtime sequence lengths. The changes add new
logic and parameters to propagate sequence length information and
indirect dispatch buffers through the attention pipeline, with
conditional code paths to maintain compatibility with the existing
direct dispatch approach.

It's part of the work to enable graph capture in phi4
#25868
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants