Add Lecture 12 code (flash attention) #55

ltoniazzi · 2025-08-18T13:21:42Z

Add code for the wonderful lecture by @t-vi about flash attention.

Reproduced the algo following the lecture exposition so that now it runs on par with scaled_dot_product_attention (on small tensors, on larger I think it needs better allocation of resources)
Moved cuda code outside the notebook
Added a bit of profiling to illustrate the register spilling slowdown, with code to illustrate the spilling slowdown in lecture_012/flash_attention_spilling_from_registers.cu and a main.cu to run ncu profiling (discussed in the notebook)

ltoniazzi added 12 commits August 15, 2025 15:28

Add first version of notebook

ff73b71

profile

2e8a5ba

Polish

5ab20a6

fix indexing

296da21

Fix

8d913c8

add thunder

398263c

Final run

b7a1f4e

Typo

e8625f8

Add sdpa backends benchmarks

63e4e47

Fix thunder and parallelise outer loop

b92fecd

Clean notebook run

e5806f9

Run notebook

c4ae17d

Provide feedback