-
Notifications
You must be signed in to change notification settings - Fork 12.2k
ggml : add ggml_set_rows #14274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
ggml : add ggml_set_rows #14274
Conversation
So far so good: #14285 I think the |
ggml/src/ggml.c
Outdated
struct ggml_tensor * ggml_set_rows( | ||
struct ggml_context * ctx, | ||
struct ggml_tensor * a, | ||
struct ggml_tensor * b, | ||
struct ggml_tensor * c) { | ||
GGML_ASSERT(b->ne[2] == c->ne[1]); | ||
GGML_ASSERT(c->ne[3] == 1); | ||
GGML_ASSERT(a->type == GGML_TYPE_F16); | ||
GGML_ASSERT(b->type == GGML_TYPE_F32); | ||
GGML_ASSERT(c->type == GGML_TYPE_I64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to allow broadcasting c
into b
. It would avoid this ggml_repeat_4d
here:
llama.cpp/src/llama-kv-cache-unified.cpp
Lines 795 to 799 in a0c0fb6
v_cur = ggml_cont_3d(ctx, v_cur, 1, v_cur->ne[0], v_cur->ne[1]); | |
kv_idxs = ggml_repeat_4d(ctx, kv_idxs, v_cur->ne[1], v_cur->ne[2], 1, 1); | |
return ggml_set_rows(ctx, v_view, v_cur, kv_idxs); |
I think we want to support broadcasting like this: // a TD [n_embd, ne01, ne01_2, ne01_3]
// b TS [n_embd, n_rows, ne01_2, ne01_3]
// c I64 [n_rows, ne21, ne22, 1]
//
// broadcast:
// ne01_2 % ne21 == 0
// ne01_3 % ne22 == 0
GGML_API struct ggml_tensor * ggml_set_rows(
struct ggml_context * ctx,
struct ggml_tensor * a, // destination
struct ggml_tensor * b, // source
struct ggml_tensor * c); // row indices Will try to implement this and open a PR to this branch. |
Opened rgerganov#3 |
Add ggml_set_rows(a, b, c) which copies rows from 'b' into 'a' using indices from 'c'. ref: ggml-org#8366
Question: why do we need a new ggml op |
We want to be able to set rows randomly - not necessarily contiguously. For example, we might want to set rows 2 5 and 13. Don't see how this can be achieved with |
// true if the elements in dimension 0 are contiguous, or there is just 1 block of elements | ||
GGML_API bool ggml_is_contiguous_rows(const struct ggml_tensor * tensor); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attention here
@@ -192,6 +192,7 @@ typedef pthread_t ggml_thread_t; | |||
|
|||
static const struct ggml_type_traits_cpu type_traits_cpu[GGML_TYPE_COUNT] = { | |||
[GGML_TYPE_F32] = { | |||
.from_float = (ggml_from_float_t) ggml_cpu_fp32_to_fp32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attention here
static void ggml_compute_forward_repeat_i64( | ||
const ggml_compute_params * params, | ||
ggml_tensor * dst) { | ||
|
||
const ggml_tensor * src0 = dst->src[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After adding the broadcast support to ggml_set_rows()
this is not really needed anymore, but I think it's nice to have either way.
It looks like we are hitting actions/runner-images#12435 when building on Windows:
|
Add ggml_set_rows(a, b, c) which copies rows from 'b' into 'a' using indices from 'c'.
ref: #8366