attention.py only supports one GPU

### Model/Pipeline/Scheduler description

Hello!

In attention.py the lines

```
                _ = xformers.ops.memory_efficient_attention(
                    torch.randn((1, 2, 40), device="cuda"),
                    torch.randn((1, 2, 40), device="cuda"),
                    torch.randn((1, 2, 40), device="cuda"),
                )
```
use always gpu 1 ("cuda:0"). That is fine if there is only one gpu. But if you have > 1 gpu all gpus use the vram of gpu 1. At the end the performance of gpu 1 is decreasing with every additional gpu parking its load in the gpu 1 vram.

I found a solution for my three gpus in form of three different conda environments where I replaced device="cuda" with device="cuda:0", device="cuda:1" and device="cuda:2" in the three different attention.py files in the three environments. Surely if would have only advantages if one could overgive the desired cuda in form of

`pipe1.enable_xformers_memory_efficient_attention("cuda:1")`

i.E. if using xFormers.
Or is there already a simple way to overwrite the cuda setting? 


Best regards
Marc
 

### Open source status

- [X] The model implementation is available
- [X] The model weights are available (Only relevant if addition is not a scheduler).

### Provide useful links for the implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

attention.py only supports one GPU #1344

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

attention.py only supports one GPU #1344

Description

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions