rpc : use backend registry, support dl backends #13304

slaren · 2025-05-04T14:37:04Z

Adds support for GGML_BACKEND_DL
Adds -d, --device option to select the device to use with the RPC server
Moves CPU memory detection code to CPU backend

ggml-ci

rgerganov

I am not able to test right now but changes look fine.

Note that we should move the message "Starting RPC server vX.Y.Z" in ggml_backend_rpc_start_server() as we no longer know which version we are starting in main(). I can fix this in a follow-up patch.

* origin/master: (27 commits) llama : fix build_ffn without gate (ggml-org#13336) CUDA: fix bad asserts for partial offload (ggml-org#13337) convert : qwen2/3moe : set yarn metadata if present (ggml-org#13331) CUDA: fix --split-mode row for MMQ (ggml-org#13323) gguf-py : avoid requiring pyside6 for other scripts (ggml-org#13036) CUDA: fix logic for clearing padding with -ngl 0 (ggml-org#13320) sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (ggml-org#13264) server : Webui - change setText command from parent window to also send the message. (ggml-org#13309) mtmd : rename llava directory to mtmd (ggml-org#13311) clip : fix confused naming ffn_up and ffn_down (ggml-org#13290) convert : bailingmoe : set yarn metadata if present (ggml-org#13312) SYCL: Disable mul_mat kernels for noncontiguous tensor b (ggml-org#13308) mtmd : add C public API (ggml-org#13184) rpc : use backend registry, support dl backends (ggml-org#13304) ggml : activate s390x simd for Q3_K (ggml-org#13301) llava/mtmd : fixes to fully support dl backends (ggml-org#13303) llama : build windows releases with dl backends (ggml-org#13220) CUDA: fix race condition in MMQ stream-k fixup (ggml-org#13299) CUDA: fix race condition in MMQ ids_dst (ggml-org#13294) vulkan: Additional type support for unary, binary, and copy (ggml-org#13266) ...

segmond · 2025-05-12T02:29:12Z

I'm guessing this will be a huge change, but what would it take to get the -d to behave like in other llama-cli & llama-server, so instead of running N number of servers for N devices in a remote host, we will run 1 server per remote node?

github-actions bot added examples ggml changes relating to the ggml tensor library for machine learning labels May 4, 2025

slaren force-pushed the sl/rpc-dl-backend branch 3 times, most recently from 314ccd7 to afa429a Compare May 4, 2025 15:02

rpc : use backend registry, support dl backends

07da432

ggml-ci

slaren force-pushed the sl/rpc-dl-backend branch from afa429a to 07da432 Compare May 4, 2025 15:04

Merge remote-tracking branch 'origin/master' into sl/rpc-dl-backend

656c6f5

rgerganov reviewed May 4, 2025

View reviewed changes

rgerganov approved these changes May 4, 2025

View reviewed changes

slaren added 2 commits May 4, 2025 20:21

move version print to ggml_backend_rpc_start_server

af8ba80

fix includes

460ef29

slaren merged commit 9fdfcda into master May 4, 2025
45 checks passed

slaren deleted the sl/rpc-dl-backend branch May 4, 2025 19:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rpc : use backend registry, support dl backends #13304

rpc : use backend registry, support dl backends #13304

Uh oh!

slaren commented May 4, 2025 •

edited

Loading

Uh oh!

rgerganov left a comment

Uh oh!

Uh oh!

segmond commented May 12, 2025

Uh oh!

Uh oh!

rpc : use backend registry, support dl backends #13304

rpc : use backend registry, support dl backends #13304

Uh oh!

Conversation

slaren commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

segmond commented May 12, 2025

Uh oh!

Uh oh!

slaren commented May 4, 2025 •

edited

Loading