Skip to content

ggml : add ggml_set_rows #14274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

rgerganov
Copy link
Collaborator

Add ggml_set_rows(a, b, c) which copies rows from 'b' into 'a' using indices from 'c'.

ref: #8366

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 19, 2025
@ggerganov ggerganov mentioned this pull request Jun 19, 2025
4 tasks
@ggerganov
Copy link
Member

So far so good: #14285

I think the ggml_set_rows() alone could be a very useful addition since this mechanism can make the llama_kv_cache_unified::find_slot() to search not just for continuous slots of KV cells, but effectively be able to "scatter" the ubatch. This would be a useful improvement, regardless if the graph reuse works or not, so I think we should proceed to implement this operator.

ggml/src/ggml.c Outdated
Comment on lines 3400 to 3421
struct ggml_tensor * ggml_set_rows(
struct ggml_context * ctx,
struct ggml_tensor * a,
struct ggml_tensor * b,
struct ggml_tensor * c) {
GGML_ASSERT(b->ne[2] == c->ne[1]);
GGML_ASSERT(c->ne[3] == 1);
GGML_ASSERT(a->type == GGML_TYPE_F16);
GGML_ASSERT(b->type == GGML_TYPE_F32);
GGML_ASSERT(c->type == GGML_TYPE_I64);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to allow broadcasting c into b. It would avoid this ggml_repeat_4d here:

v_cur = ggml_cont_3d(ctx, v_cur, 1, v_cur->ne[0], v_cur->ne[1]);
kv_idxs = ggml_repeat_4d(ctx, kv_idxs, v_cur->ne[1], v_cur->ne[2], 1, 1);
return ggml_set_rows(ctx, v_view, v_cur, kv_idxs);

@ggerganov
Copy link
Member

I think we want to support broadcasting like this:

    // a TD  [n_embd, ne01,   ne01_2, ne01_3]
    // b TS  [n_embd, n_rows, ne01_2, ne01_3]
    // c I64 [n_rows, ne21,   ne22,   1]
    //
    // broadcast:
    //   ne01_2 % ne21 == 0
    //   ne01_3 % ne22 == 0
    GGML_API struct ggml_tensor * ggml_set_rows(
            struct ggml_context * ctx,
            struct ggml_tensor  * a,  // destination
            struct ggml_tensor  * b,  // source
            struct ggml_tensor  * c); // row indices

Will try to implement this and open a PR to this branch.

@ggerganov
Copy link
Member

Will try to implement this and open a PR to this branch.

Opened rgerganov#3

@github-actions github-actions bot added testing Everything test related examples Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jun 23, 2025
@JohannesGaessler
Copy link
Collaborator

Question: why do we need a new ggml op GGML_SET_ROWS? Wouldn't it be possible to create a tensor with GGML_GET_ROWS where instead of a new tensor the output tensor is a view of a?

@ggerganov
Copy link
Member

We want to be able to set rows randomly - not necessarily contiguously. For example, we might want to set rows 2 5 and 13. Don't see how this can be achieved with GGML_GET_ROWS.

@ggerganov ggerganov marked this pull request as ready for review June 26, 2025 07:30
@ggerganov
Copy link
Member

This operator seems quite useful to me and after the recent experiments in #14285 and #14363 I think we should proceed with adding it. I suggest to merge this PR for now and continue adding the rest of the backend implementations towards master.

@ggerganov ggerganov requested a review from slaren June 26, 2025 07:35
Comment on lines +691 to +693
// true if the elements in dimension 0 are contiguous, or there is just 1 block of elements
GGML_API bool ggml_is_contiguous_rows(const struct ggml_tensor * tensor);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attention here

@@ -192,6 +192,7 @@ typedef pthread_t ggml_thread_t;

static const struct ggml_type_traits_cpu type_traits_cpu[GGML_TYPE_COUNT] = {
[GGML_TYPE_F32] = {
.from_float = (ggml_from_float_t) ggml_cpu_fp32_to_fp32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attention here

Comment on lines +2269 to +2273
static void ggml_compute_forward_repeat_i64(
const ggml_compute_params * params,
ggml_tensor * dst) {

const ggml_tensor * src0 = dst->src[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After adding the broadcast support to ggml_set_rows() this is not really needed anymore, but I think it's nice to have either way.

@rgerganov
Copy link
Collaborator Author

It looks like we are hitting actions/runner-images#12435 when building on Windows:

FAILED: [code=1] ggml/src/CMakeFiles/ggml-base.dir/Release/ggml.cpp.obj 
ccache C:\PROGRA~1\LLVM\bin\CLANG_~1.EXE --target=arm64-pc-windows-msvc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_CRT_SECURE_NO_WARNINGS -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -DCMAKE_INTDIR=\"Release\" -ID:/a/llama.cpp/llama.cpp/ggml/src/. -ID:/a/llama.cpp/llama.cpp/ggml/src/../include -march=armv8.7-a -fvectorize -ffp-model=fast -fno-finite-math-only -Wno-format -Wno-unused-variable -Wno-unused-function -Wno-gnu-zero-variadic-macro-arguments -O3 -DNDEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -std=gnu++17 -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi -MD -MT ggml/src/CMakeFiles/ggml-base.dir/Release/ggml.cpp.obj -MF ggml\src\CMakeFiles\ggml-base.dir\Release\ggml.cpp.obj.d -o ggml/src/CMakeFiles/ggml-base.dir/Release/ggml.cpp.obj -c D:/a/llama.cpp/llama.cpp/ggml/src/ggml.cpp
In file included from D:/a/llama.cpp/llama.cpp/ggml/src/ggml.cpp:1:
In file included from D:/a/llama.cpp/llama.cpp/ggml/src\ggml-impl.h:475:
In file included from C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.44.35207\include\vector:8:
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.44.35207\include\yvals_core.h:908:1: error: static assertion failed: error STL1000: Unexpected compiler version, expected Clang 19.0.0 or newer.
  908 | _EMIT_STL_ERROR(STL1000, "Unexpected compiler version, expected Clang 19.0.0 or newer.");
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.44.35207\include\yvals_core.h:534:58: note: expanded from macro '_EMIT_STL_ERROR'
  534 | #define _EMIT_STL_ERROR(NUMBER, MESSAGE)   static_assert(false, "error " #NUMBER ": " MESSAGE)
      |                                                          ^~~~~
1 error generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) examples ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants