Support various block sizes #38

WoosukKwon · 2023-04-15T09:45:50Z

No description provided.

… block-ablation

…pes; Using attn_fwd triton kernel from ROCm/triton main_perf that does not cause triton compolier to hang (vllm-project#38)

* Add indexer_k_quant_and_cache_kernel Signed-off-by: Barry Kang <[email protected]> * Accept 3D kv_cache buffer Signed-off-by: Barry Kang <[email protected]> * Address review comments Signed-off-by: Barry Kang <[email protected]> --------- Signed-off-by: Barry Kang <[email protected]>

WoosukKwon added 10 commits April 15, 2023 01:12

Add support for block size 1, 2, 4

e8ed722

Add block-s ze to dir name

4fe90e9

Use max and min

510c46e

Merge branch 'block-ablation' of github.com:WoosukKwon/cacheflow into…

7bf8a56

… block-ablation

Support block size 64, 128, 256

10807c5

bugfix

90d9910

Minor

f548459

Change default block size to 16

32ff328

Comment out multi-query cached kv attention

7213b98

Enforce FCFS

6337f35

WoosukKwon merged commit 0f4b321 into main Apr 15, 2023

WoosukKwon deleted the block-ablation branch April 15, 2023 16:03

shanshanpt mentioned this pull request Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Support various block sizes & Change default block size to 16 (vllm-p…

bc5caa4

…roject#38)

Venkat2811 mentioned this pull request May 15, 2024

Mistral7b takes 4 times its size in VRAM on A100 huggingface/text-generation-inference#1863

Closed

tianyil1 pushed a commit to tianyil1/vllm that referenced this pull request Jun 5, 2024

Fix error with high-level profiler in multi-card scenario (vllm-proje…

3c827b3

…ct#38)

yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024

Rs/bump main to v0.3.2 (vllm-project#38)

fdb3cbd

ZHJ19970917 mentioned this pull request Jul 14, 2024

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

Closed

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

markmc mentioned this pull request May 21, 2025

[Bug][Failing Test]: Distributed Comm Ops - distributed/test_shm_broadcast.py #18492

Closed

1 task

zerosurplus mentioned this pull request Jun 16, 2025

[Bug]: torch.distributed.DistNetworkError: The client socket has timed out after 600000ms while trying to connect to (172.17.0.9, 46229). #19670

Open

1 task

xiaomofang mentioned this pull request Jul 31, 2025

[Bug]: There is an issue with speculative inference in Eagle mode, where the context length of vLLM inference is constrained by the draft model. #21986

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support various block sizes #38

Support various block sizes #38

Uh oh!

WoosukKwon commented Apr 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Support various block sizes #38

Support various block sizes #38

Uh oh!

Conversation

WoosukKwon commented Apr 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant