-
Notifications
You must be signed in to change notification settings - Fork 12k
Checking llama's defrag_graph size #6019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not exactly - to fill a hole, we don't always make 1 move. It depends if we can find a big enough chunk at the end of the cache. If we cannot find, then we would need to make multiple moves This check is using the worst-case scenario, where the end of the cache is so fragmented that each cell in the hole would need a separate move. The check can be improved to be exact, but I didn't want to make the implementation too convoluted If we figure out how to implement this in a neat way, we can fix it |
Oh, I think I get what you mean. I tried to improve the check a little without complicating it too much, and this is what I ended up with. I'll be glad to get your feedback. |
And one more question, isn’t
|
Thanks for looking into this!
|
Great! I fixed fragmentation calculation and opened the PR for your review. |
I would like to clarify whether this check
is correct in
llama_kv_cache_defrag_internal?
Now, if there is one large hole (huge
nh
variable) in the kv_cache, the condition will not be met and the cycle will be stopped, although only one move will be enough to fill it.Wouldn't it be more correct to replace the check with just
6*n_moves*n_layer >= LLAMA_MAX_NODES
?The text was updated successfully, but these errors were encountered: