Skip to content

Commit 1137a05

Browse files
committed
Use the optimized block sizes after tuning the kernel
Signed-off-by: Xiongfei Wei <[email protected]>
1 parent 0578e5a commit 1137a05

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

vllm/v1/attention/backends/pallas.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212
from vllm.attention.backends.utils import CommonAttentionState
1313

1414
# These are the 2 tunable parameters of the paged attention Pallas kernel.
15-
NUM_QUERIES_PER_BLOCK = 32
16-
NUM_KV_PAGES_PER_BLOCK = 128
15+
NUM_QUERIES_PER_BLOCK = 16
16+
NUM_KV_PAGES_PER_BLOCK = 256
1717

1818

1919
class PallasAttentionBackend(AttentionBackend):

0 commit comments

Comments
 (0)