Memory-efficient attention (without xformers)

I implemented sub-quadratic attention (as described in https://arxiv.org/abs/2112.05682v2):  
https://twitter.com/Birchlabs/status/1607503573906063362  
https://github.com/Birch-san/diffusers/pull/1  
https://github.com/Birch-san/diffusers-play/commit/a573e3d9ea4fdacfdee7ddd5eecdac29b236fc00

is this worth upstreaming? it enables creation of images larger than can be achieved with attention slicing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory-efficient attention (without xformers) #1892

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory-efficient attention (without xformers) #1892

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions