Update PatchedVLLMKVCache for deepseek performance #2165

mengniwang95 · 2025-04-04T06:16:04Z

Type of Change

workaround

Description

Update PatchedVLLMKVCache for deepseek performance

Previously, when we use INC to convert deepseek FP8 model, we need this [commit ](intel/neural-compressor@7c0a3e2) to remove extra converts in KVCache but actually GC can remove them during graph optimization theoretically. Furthermore, the change in commit is not aligned with the design of INC patched module, which wants to keep the returned tensor BF16 because we can't make sure users' next operation. So, I update the modeling file to make GC can work for patched KVCache pattern of deepseek model. Since next release is very close and GC currently can not work as expection during decode stage, it is still a workround. We will root cause and fix it from source in next relase. This PR should work together with this PR: intel/neural-compressor#2165 Signed-off-by: Mengni Wang <[email protected]>

Update PatchedVLLMKVCache for deepseek performance

5d7c834

mengniwang95 mentioned this pull request Apr 4, 2025

Update deepseek modeling file for PatchedVLLMKVCache HabanaAI/vllm-fork#1009

Merged

yiliu30 merged commit fcf3031 into r1-woq Apr 5, 2025
7 of 9 checks passed

yiliu30 deleted the dev/mengni/kv branch April 5, 2025 07:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update PatchedVLLMKVCache for deepseek performance #2165

Update PatchedVLLMKVCache for deepseek performance #2165

Uh oh!

mengniwang95 commented Apr 4, 2025

Uh oh!

Uh oh!

Uh oh!

Update PatchedVLLMKVCache for deepseek performance #2165

Update PatchedVLLMKVCache for deepseek performance #2165

Uh oh!

Conversation

mengniwang95 commented Apr 4, 2025

Type of Change

Description

Uh oh!

Uh oh!

Uh oh!