Skip to content

Checking llama's defrag_graph size #6019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Xarbirus opened this issue Mar 12, 2024 · 5 comments
Closed

Checking llama's defrag_graph size #6019

Xarbirus opened this issue Mar 12, 2024 · 5 comments

Comments

@Xarbirus
Copy link
Contributor

I would like to clarify whether this check

if (6*(n_moves + nh)*n_layer >= LLAMA_MAX_NODES) {
    // the graph is too big, we cannot move more cells
    break;
}

is correct in llama_kv_cache_defrag_internal?

Now, if there is one large hole (huge nh variable) in the kv_cache, the condition will not be met and the cycle will be stopped, although only one move will be enough to fill it.

Wouldn't it be more correct to replace the check with just 6*n_moves*n_layer >= LLAMA_MAX_NODES?

@ggerganov
Copy link
Member

although only one move will be enough to fill it.

Not exactly - to fill a hole, we don't always make 1 move. It depends if we can find a big enough chunk at the end of the cache. If we cannot find, then we would need to make multiple moves

This check is using the worst-case scenario, where the end of the cache is so fragmented that each cell in the hole would need a separate move. The check can be improved to be exact, but I didn't want to make the implementation too convoluted

If we figure out how to implement this in a neat way, we can fix it

@Xarbirus
Copy link
Contributor Author

Oh, I think I get what you mean.

I tried to improve the check a little without complicating it too much, and this is what I ended up with. I'll be glad to get your feedback.

@Xarbirus
Copy link
Contributor Author

And one more question, isn’t n_tokens included in kv_self.used variable at this point?

const float fragmentation = kv_self.n >= 128 ? 1.0f - float(kv_self.used + n_tokens)/float(kv_self.n) : 0.0f;

@ggerganov
Copy link
Member

ggerganov commented Mar 13, 2024

Thanks for looking into this!

  • I think the PR is good
  • Yes, you are correct - this is overestimating the number of used tokens. Should fix it

@Xarbirus
Copy link
Contributor Author

Great! I fixed fragmentation calculation and opened the PR for your review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants