Skip to content

Conversation

copybara-service[bot]
Copy link

Added flash attention, with both a single-q function, and a register-tiled function.
The register-tiled version achieves a speed-up by a factor of about 9.7 over the previous attention function on an AVX3-enabled machine.

@copybara-service copybara-service bot force-pushed the test_800440132 branch 8 times, most recently from f8d74db to 65aa045 Compare September 9, 2025 14:54
…tiled function.

The register-tiled version achieves a speed-up by a factor of about 9.7 over the previous attention function on an AVX3-enabled machine.

PiperOrigin-RevId: 804913784
@copybara-service copybara-service bot merged commit f10ac41 into dev Sep 9, 2025
@copybara-service copybara-service bot deleted the test_800440132 branch September 9, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant