Skip to content

Conversation

rnwang04
Copy link

@github-actions github-actions bot added the category: continuous batching Continuous batching label Sep 24, 2025
const char* env_p = std::getenv("OV_GPU_XATTN_BLOCK_SIZE");
size_t gpu_block_size = 256;
if (env_p != nullptr && std::stoi(env_p) == 1) {
gpu_block_size = 16;
Copy link
Contributor

@ceciliapeng2011 ceciliapeng2011 Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ticket https://jira.devtools.intel.com/browse/CVS-161326 makes genai get kv cache actual precision from plugin after compiling. Can you please refer it to make genai get kv_block_size from plugin too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment! Have updated related code, would you mind taking a look again ?

@ceciliapeng2011 ceciliapeng2011 marked this pull request as draft September 24, 2025 05:32
@github-actions github-actions bot added the category: llm_bench Label for tool/llm_bench folder label Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: llm_bench Label for tool/llm_bench folder
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants