-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Description
Model/Pipeline/Scheduler description
Hello!
In attention.py the lines
_ = xformers.ops.memory_efficient_attention(
torch.randn((1, 2, 40), device="cuda"),
torch.randn((1, 2, 40), device="cuda"),
torch.randn((1, 2, 40), device="cuda"),
)
use always gpu 1 ("cuda:0"). That is fine if there is only one gpu. But if you have > 1 gpu all gpus use the vram of gpu 1. At the end the performance of gpu 1 is decreasing with every additional gpu parking its load in the gpu 1 vram.
I found a solution for my three gpus in form of three different conda environments where I replaced device="cuda" with device="cuda:0", device="cuda:1" and device="cuda:2" in the three different attention.py files in the three environments. Surely if would have only advantages if one could overgive the desired cuda in form of
pipe1.enable_xformers_memory_efficient_attention("cuda:1")
i.E. if using xFormers.
Or is there already a simple way to overwrite the cuda setting?
Best regards
Marc
Open source status
- The model implementation is available
- The model weights are available (Only relevant if addition is not a scheduler).
Provide useful links for the implementation
No response