I implemented sub-quadratic attention (as described in https://arxiv.org/abs/2112.05682v2): https://twitter.com/Birchlabs/status/1607503573906063362 https://github.com/Birch-san/diffusers/pull/1 https://github.com/Birch-san/diffusers-play/commit/a573e3d9ea4fdacfdee7ddd5eecdac29b236fc00 is this worth upstreaming? it enables creation of images larger than can be achieved with attention slicing.