Shared Memory while multi-gpu? #429
Closed
Bigfield77
started this conversation in
General
Replies: 3 comments
-
do you know if that a limitation with exllamav2, pytorch or the nvidia driver suite? |
Beta Was this translation helpful? Give feedback.
0 replies
-
I'm not really sure. This would be down to the NVIDIA driver, and there isn't to my knowledge any way you can control the sysmem fallback behavior from software. (?) |
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am able to load llama3 70b instruct in 5.0bpw using exllamav2_hf in ooba if I am expose only on GPU (Extremely slow)
If i expose my 2 GPUs (3090s), there is not enough vram to load it and it does not fall back to using shared memory like it does when using a single GPU.
Is this something that is possible to enable, something like falling back to shared memory on the last available gpu?
I am running under windows
Beta Was this translation helpful? Give feedback.
All reactions