llama : update worst-case graph for unified cache #17379

ggerganov · 2025-11-19T07:58:37Z

Fixes https://github.com/ggml-org/llama.cpp/actions/runs/19443607071/job/55632677520?pr=17276#step:3:4553

ggerganov · 2025-11-19T08:02:40Z

src/llama-context.cpp

            throw std::runtime_error("failed to initialize memory context");
        }

-        const uint32_t n_seqs = cparams.kv_unified ? 1 : cparams.n_seq_max;


I can't recall why this cparams.kv_unified check was added in #14363. It seems unnecessary now and removing it gives a better estimate of the worst-case graph.

ggerganov · 2025-11-19T14:52:15Z

There are 2 remaining issues with the CI:

The vulkan test-thread-safety is triggering graph reallocation which I am not yet sure what is the cause
The CUDA backend does not support FA for head size 32 which is causing reallocation in this test: https://github.com/ggml-org/llama.cpp/actions/runs/19494035114/job/55791890377?pr=17379#step:3:3238

slaren · 2025-11-19T15:56:23Z

The CUDA failure is caused by different pp and tg graphs. This happens because this model has some weights in the input layer, that are loaded in the CPU, but copied to CUDA with large batches. A solution to that would be to disable op offloading for these tests with --no-op-offload. I have not been able to reproduce the Vulkan issue, llvmpipe is crashing on my system for some reason.

jeffbolznv · 2025-11-19T19:39:49Z

* The vulkan `test-thread-safety` is triggering graph reallocation which I am not yet sure what is the cause

Could this be the same issue I described at #17033 (comment)? I haven't had a chance to get back to this.

ggerganov · 2025-11-19T22:07:42Z

Could this be the same issue I described at #17033 (comment)? I haven't had a chance to get back to this.

I can test this tomorrow to confirm

ggerganov · 2025-11-20T06:30:40Z

Yes, if I disable the graph optimization logic in the vulkan backend, the test-thread-safety runs without causing reallocations:

diff --git a/ggml/src/ggml-vulkan/ggml-vulkan.cpp b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
index bb3eb977c..ca12d2d1f 100644
--- a/ggml/src/ggml-vulkan/ggml-vulkan.cpp
+++ b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
@@ -12925,6 +12925,7 @@ static ggml_status ggml_backend_vk_graph_compute(ggml_backend_t backend, ggml_cg
 // Sort the graph for improved parallelism.
 static void ggml_vk_graph_optimize(ggml_backend_t backend, struct ggml_cgraph * graph)
 {
+    return;
     VK_LOG_DEBUG("ggml_vk_graph_optimize(" << graph->n_nodes << " nodes)");
     ggml_backend_vk_context * ctx = (ggml_backend_vk_context *)backend->context;

ggerganov · 2025-11-20T19:57:22Z

Think we can merge this into #17276 and then figure out what to do with the remaining issue of graph optimization causing reallocations. (btw, I'm still not sure I understand in which cases this happens)

llama : update worst-case graph for unified cache

e483804

ggerganov commented Nov 19, 2025

View reviewed changes

github-actions bot added the examples label Nov 19, 2025

ci : disable op offload in some tests

6cdda87

github-actions bot added the devops improvements to build systems and github actions label Nov 20, 2025

ggerganov marked this pull request as ready for review November 20, 2025 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : update worst-case graph for unified cache #17379

llama : update worst-case graph for unified cache #17379

ggerganov commented Nov 19, 2025

Uh oh!

ggerganov Nov 19, 2025

Uh oh!

ggerganov commented Nov 19, 2025

Uh oh!

slaren commented Nov 19, 2025

Uh oh!

jeffbolznv commented Nov 19, 2025

Uh oh!

ggerganov commented Nov 19, 2025

Uh oh!

ggerganov commented Nov 20, 2025

Uh oh!

ggerganov commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llama : update worst-case graph for unified cache #17379

Are you sure you want to change the base?

llama : update worst-case graph for unified cache #17379

Conversation

ggerganov commented Nov 19, 2025

Uh oh!

ggerganov Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Nov 19, 2025

Uh oh!

slaren commented Nov 19, 2025

Uh oh!

jeffbolznv commented Nov 19, 2025

Uh oh!

ggerganov commented Nov 19, 2025

Uh oh!

ggerganov commented Nov 20, 2025

Uh oh!

ggerganov commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants