Checking llama's defrag_graph size #6019

Xarbirus · 2024-03-12T14:43:05Z

I would like to clarify whether this check

if (6*(n_moves + nh)*n_layer >= LLAMA_MAX_NODES) {
    // the graph is too big, we cannot move more cells
    break;
}

is correct in llama_kv_cache_defrag_internal?

Now, if there is one large hole (huge nh variable) in the kv_cache, the condition will not be met and the cycle will be stopped, although only one move will be enough to fill it.

Wouldn't it be more correct to replace the check with just 6*n_moves*n_layer >= LLAMA_MAX_NODES?

The text was updated successfully, but these errors were encountered:

ggerganov · 2024-03-12T15:54:59Z

although only one move will be enough to fill it.

Not exactly - to fill a hole, we don't always make 1 move. It depends if we can find a big enough chunk at the end of the cache. If we cannot find, then we would need to make multiple moves

This check is using the worst-case scenario, where the end of the cache is so fragmented that each cell in the hole would need a separate move. The check can be improved to be exact, but I didn't want to make the implementation too convoluted

If we figure out how to implement this in a neat way, we can fix it

Xarbirus · 2024-03-13T10:33:46Z

Oh, I think I get what you mean.

I tried to improve the check a little without complicating it too much, and this is what I ended up with. I'll be glad to get your feedback.

Xarbirus · 2024-03-13T14:53:56Z

And one more question, isn’t n_tokens included in kv_self.used variable at this point?

const float fragmentation = kv_self.n >= 128 ? 1.0f - float(kv_self.used + n_tokens)/float(kv_self.n) : 0.0f;

ggerganov · 2024-03-13T17:53:21Z

Thanks for looking into this!

I think the PR is good
Yes, you are correct - this is overestimating the number of used tokens. Should fix it

Xarbirus · 2024-03-13T21:05:14Z

Great! I fixed fragmentation calculation and opened the PR for your review.

Xarbirus closed this as completed Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Checking llama's defrag_graph size #6019

Checking llama's defrag_graph size #6019

Xarbirus commented Mar 12, 2024

ggerganov commented Mar 12, 2024

Uh oh!

Xarbirus commented Mar 13, 2024

Uh oh!

Xarbirus commented Mar 13, 2024

Uh oh!

ggerganov commented Mar 13, 2024 •

edited

Loading

Uh oh!

Xarbirus commented Mar 13, 2024

Uh oh!

Checking llama's defrag_graph size #6019

Checking llama's defrag_graph size #6019

Comments

Xarbirus commented Mar 12, 2024

ggerganov commented Mar 12, 2024

Uh oh!

Xarbirus commented Mar 13, 2024

Uh oh!

Xarbirus commented Mar 13, 2024

Uh oh!

ggerganov commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xarbirus commented Mar 13, 2024

Uh oh!

ggerganov commented Mar 13, 2024 •

edited

Loading