Feature Request: Split model over multiple Vulkan GPUs

### Prerequisites

- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Related to #5259 (closed), if you want I could move this there.

How hard would it be to implement splitting over vulkan GPUs instead of CUDA/HIP?

I guess OpenCL could be another path if vulkan is too hard, since there's now a maturing rusticl driver that can be layered on top of vulkan as well as various native drivers, but it may not be fully baked enough yet to support llamacpp (though maybe that's changing [1]). Also, afaik mapping memory between GPUs in a multi-GPU config is still under active development/implementation.

[1] https://archive.fosdem.org/2024/events/attachments/fosdem-2024-3364-why-not-run-opencl-accelerated-llm-on-your-phone-/slides/22383/Why_not_run_OpenCL-accelerated_LLM_on_your_phon_nK2DudB.pdf

### Motivation

This would be really helpful as it's now not unreasonable to want to ditch nvidia's drivers for the open source NVK vulkan driver, and AMD's cards are also MUCH better supported with vulkan on the RADV driver than with AMD's spotty/nonexistant ROCm/HIP support. Vulkan is also more universally supported, so this could enable someone to split a model over eg. an AMD and an nvidia GPU if that's what they have.

### Possible Implementation

N/A

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Split model over multiple Vulkan GPUs #11004

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Split model over multiple Vulkan GPUs #11004

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions