About the attention implementation with torch < 2.0

### Describe the bug

I tried to run `train_unconditional.py` with torch 1.12.1 but failed.
The bug seems in https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py#L94

I add a code `batch_size = batch_size // head_size` right after L90, and the program seems to work well for now. But I'm not sure whether there are other bugs related to the previous torch versions.

### Reproduction

https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py#L94

### Logs

```shell
File "/data/diffusers/src/diffusers/models/attention.py", line 97, in reshape_batch_dim_to_heads
    tensor = tensor.permute(0, 2, 1, 3).reshape(batch_size, seq_len, dim * head_size)
RuntimeError: shape '[1024, 16, 512]' is invalid for input of size 131072
```


### System Info

torch 1.12.1+cu113

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About the attention implementation with torch < 2.0 #3207

Describe the bug

Reproduction

Logs

System Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About the attention implementation with torch < 2.0 #3207

Description

Describe the bug

Reproduction

Logs

System Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions