Skip to content

About the attention implementation with torch < 2.0 #3207

@tyshiwo1

Description

@tyshiwo1

Describe the bug

I tried to run train_unconditional.py with torch 1.12.1 but failed.
The bug seems in https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py#L94

I add a code batch_size = batch_size // head_size right after L90, and the program seems to work well for now. But I'm not sure whether there are other bugs related to the previous torch versions.

Reproduction

https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py#L94

Logs

File "/data/diffusers/src/diffusers/models/attention.py", line 97, in reshape_batch_dim_to_heads
    tensor = tensor.permute(0, 2, 1, 3).reshape(batch_size, seq_len, dim * head_size)
RuntimeError: shape '[1024, 16, 512]' is invalid for input of size 131072

System Info

torch 1.12.1+cu113

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions