Prefilling Decoding Disaggregation Support？ #15959

hipudding · 2025-09-13T08:13:34Z

hipudding
Sep 13, 2025
Collaborator

May I ask if there are any plans to support KV cache copying across contexts? This would make it easier to implement the Prefilling Decoding Disaggregation. At the moment, I only see that KV cache copying is supported within a single context.

Answered by ggerganov

Sep 13, 2025

To copy the memory (i.e. a KV cache for example) from one context to another, you can use the llama_state_ API.

View full answer

ggerganov · 2025-09-13T11:27:03Z

ggerganov
Sep 13, 2025
Maintainer

To copy the memory (i.e. a KV cache for example) from one context to another, you can use the llama_state_ API.

1 reply

hipudding Sep 14, 2025
Collaborator Author

Thanks.

hipudding · 2025-09-14T05:38:26Z

hipudding
Sep 14, 2025
Collaborator Author

Another question: is it safe for multiple contexts to share the same model object? I’d like to use multiple contexts but avoid loading the model repeatedly.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prefilling Decoding Disaggregation Support？ #15959

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Prefilling Decoding Disaggregation Support？ #15959

Uh oh!

hipudding Sep 13, 2025 Collaborator

Replies: 2 comments · 1 reply

Uh oh!

ggerganov Sep 13, 2025 Maintainer

Uh oh!

hipudding Sep 14, 2025 Collaborator Author

Uh oh!

hipudding Sep 14, 2025 Collaborator Author

hipudding
Sep 13, 2025
Collaborator

Replies: 2 comments 1 reply

ggerganov
Sep 13, 2025
Maintainer

hipudding Sep 14, 2025
Collaborator Author

hipudding
Sep 14, 2025
Collaborator Author