Skip to content

attention.py only supports one GPU #1344

@Marcophono2

Description

@Marcophono2

Model/Pipeline/Scheduler description

Hello!

In attention.py the lines

                _ = xformers.ops.memory_efficient_attention(
                    torch.randn((1, 2, 40), device="cuda"),
                    torch.randn((1, 2, 40), device="cuda"),
                    torch.randn((1, 2, 40), device="cuda"),
                )

use always gpu 1 ("cuda:0"). That is fine if there is only one gpu. But if you have > 1 gpu all gpus use the vram of gpu 1. At the end the performance of gpu 1 is decreasing with every additional gpu parking its load in the gpu 1 vram.

I found a solution for my three gpus in form of three different conda environments where I replaced device="cuda" with device="cuda:0", device="cuda:1" and device="cuda:2" in the three different attention.py files in the three environments. Surely if would have only advantages if one could overgive the desired cuda in form of

pipe1.enable_xformers_memory_efficient_attention("cuda:1")

i.E. if using xFormers.
Or is there already a simple way to overwrite the cuda setting?

Best regards
Marc

Open source status

  • The model implementation is available
  • The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions