-
May I ask if there are any plans to support KV cache copying across contexts? This would make it easier to implement the Prefilling Decoding Disaggregation. At the moment, I only see that KV cache copying is supported within a single context. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
To copy the memory (i.e. a KV cache for example) from one context to another, you can use the |
Beta Was this translation helpful? Give feedback.
-
Another question: is it safe for multiple contexts to share the same model object? I’d like to use multiple contexts but avoid loading the model repeatedly. |
Beta Was this translation helpful? Give feedback.
To copy the memory (i.e. a KV cache for example) from one context to another, you can use the
llama_state_
API.