-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[P/D][V1] KV Connector API V1 #15960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+1,377
−83
Merged
Changes from 3 commits
Commits
Show all changes
123 commits
Select commit
Hold shift + click to select a range
6a12481
Fixing DCO issue and format checker issue
ApostaC 34bea75
fixing pre-commit conflicts
ApostaC 20ef2ac
[fix] fix the runtime error when no kv cache config is provided
ApostaC 430e402
[fix] compatibility with v0 and address review comments
ApostaC 300ddac
[fix] format checker issue and [disable] connector during profile run
ApostaC 519dd3e
Merge remote-tracking branch 'upstream/main' into v1-disagg
robertgshaw2-redhat 7e0695b
updated to remove torch.load
robertgshaw2-redhat 553f416
updated
robertgshaw2-redhat b18bd8f
Merge pull request #3 from ApostaC/v1-disagg
robertgshaw2-redhat 55d1b5b
updated
robertgshaw2-redhat c50e620
fixed typo
robertgshaw2-redhat b22fe38
updated
robertgshaw2-redhat da257aa
updated
robertgshaw2-redhat 9751e0b
update comments
robertgshaw2-redhat 2b77bcd
updated
robertgshaw2-redhat 5cbd434
updated
robertgshaw2-redhat 4c6a93e
comment
robertgshaw2-redhat d8ec5a6
updared
robertgshaw2-redhat fcd2dc9
updated
robertgshaw2-redhat 1f9c252
format
robertgshaw2-redhat 7350244
[fix] typo to pass format checker
ApostaC 1586d58
updated
robertgshaw2-redhat e2ecc14
Merge branch 'local-dev/v1-disagg' of https://github.com/ApostaC/vllm…
robertgshaw2-redhat 5accb53
stash
robertgshaw2-redhat 31d807e
stash
robertgshaw2-redhat a73721a
updated
robertgshaw2-redhat 00df670
updated
robertgshaw2-redhat 4ebcc3e
updated
robertgshaw2-redhat da019df
updated
robertgshaw2-redhat 90e8c53
updated
robertgshaw2-redhat 8b3f606
updated
robertgshaw2-redhat 0163070
Merge pull request #4 from robertgshaw2-redhat/rob-changes
robertgshaw2-redhat de1e487
fix nit
robertgshaw2-redhat 48c2eb2
updated
robertgshaw2-redhat e72e5e4
updated
robertgshaw2-redhat 7833645
updared
robertgshaw2-redhat 1881aa5
updated
robertgshaw2-redhat eca7a49
cleaning
robertgshaw2-redhat b0629bd
updated
robertgshaw2-redhat 7766ca5
updated
robertgshaw2-redhat 7b64acb
clean up code
robertgshaw2-redhat b1310fd
updated
robertgshaw2-redhat 689379e
updaed
robertgshaw2-redhat 62e1421
updated
robertgshaw2-redhat 5145566
updated
robertgshaw2-redhat 20decdf
updated
robertgshaw2-redhat fc58dd5
updated
robertgshaw2-redhat 25c9592
updated
robertgshaw2-redhat 40e5d81
refactor
robertgshaw2-redhat e64f745
updated
robertgshaw2-redhat 74af233
done with nits
robertgshaw2-redhat 7c31e29
nits
robertgshaw2-redhat 7f57f3c
update lifecycle
robertgshaw2-redhat 05349a5
updated
robertgshaw2-redhat 8e1eadc
updated
robertgshaw2-redhat 54e1491
updated
robertgshaw2-redhat 9c4159c
updated
robertgshaw2-redhat 1d8415d
rename
robertgshaw2-redhat 406d6bf
Add MLA support for v1 disagg connector (#6)
Flechman 3a24897
[Fix] memory leak problem by proper clean up
ApostaC c6c4368
fixed test failures
robertgshaw2-redhat 5dff6e9
merge
robertgshaw2-redhat 4afa50e
stash
robertgshaw2-redhat 09be260
clean up typing
robertgshaw2-redhat 3f7844d
cleanup nits
robertgshaw2-redhat d44f699
updated
robertgshaw2-redhat 329f2e7
updated
robertgshaw2-redhat 72041ca
finish docstring
robertgshaw2-redhat f9f87f2
updated
robertgshaw2-redhat 33f6e60
make pr easier to read
robertgshaw2-redhat db28310
updated
robertgshaw2-redhat 3701b5d
stash
robertgshaw2-redhat deb1323
type checking is wrong for ReqMeta
robertgshaw2-redhat be789bf
add todo for the morning
robertgshaw2-redhat a3e5762
revery by id
robertgshaw2-redhat a03d707
revery by id
robertgshaw2-redhat f696000
revery by id
robertgshaw2-redhat 1d85e63
readabilty
robertgshaw2-redhat 521ed14
updared
robertgshaw2-redhat 6709943
nits
robertgshaw2-redhat 44ea156
cleanup
robertgshaw2-redhat 8180101
cleaning
robertgshaw2-redhat c3a2cc6
fix bug
robertgshaw2-redhat 5273e24
updated
robertgshaw2-redhat 913325f
update name
robertgshaw2-redhat 75c24d3
cleanup
robertgshaw2-redhat 17a3618
updated
robertgshaw2-redhat b4bd117
updated
robertgshaw2-redhat 01caf61
updated
robertgshaw2-redhat d8549cb
updated
robertgshaw2-redhat b362ef1
trying to fix mm, added tests
robertgshaw2-redhat 485b22e
Merge remote-tracking branch 'upstream/main' into local-dev/v1-disagg
robertgshaw2-redhat 78d523e
update comment
robertgshaw2-redhat 4c38138
updated
robertgshaw2-redhat 7af6ce2
commit test improvements
robertgshaw2-redhat 1ad993b
remove disaggregated tests
robertgshaw2-redhat 3a08dda
updated
robertgshaw2-redhat e49874d
update comment
robertgshaw2-redhat dd7969a
fix test case
robertgshaw2-redhat e1f130e
improve test code quality
robertgshaw2-redhat 611b782
added better testing
robertgshaw2-redhat f6b8bff
update comments
robertgshaw2-redhat 9609115
updated
robertgshaw2-redhat 7ce3bd6
updated
robertgshaw2-redhat c3f38d7
cleanup
robertgshaw2-redhat 6dfda44
updated
robertgshaw2-redhat 81d008a
cosmetic
robertgshaw2-redhat 6d35884
clean up
robertgshaw2-redhat 79fe730
updated
robertgshaw2-redhat ad18a3b
update nits
robertgshaw2-redhat edefdff
Merge remote-tracking branch 'upstream/main' into local-dev/v1-disagg
robertgshaw2-redhat c1a1169
updated
robertgshaw2-redhat 1b8ec0b
updated
robertgshaw2-redhat ff4b98f
updated
robertgshaw2-redhat 17b61fb
updated
robertgshaw2-redhat ac0660d
updated
robertgshaw2-redhat ecfb4ea
updated
robertgshaw2-redhat abdddf0
cleanup
robertgshaw2-redhat 8695d96
cleanup
robertgshaw2-redhat 7b5ba2c
updated
robertgshaw2-redhat 6be9cf9
fixed preemption
robertgshaw2-redhat 5363ed0
Update vllm/distributed/kv_transfer/kv_connector/factory.py
robertgshaw2-redhat 247195d
fix pre-commit
robertgshaw2-redhat File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
36 changes: 36 additions & 0 deletions
36
examples/offline_inference/disaggrated-prefill-v1/decode_example.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| from vllm import LLM, SamplingParams | ||
| from vllm.config import KVTransferConfig | ||
|
|
||
| # Read prompts from output.txt | ||
| prompts = [] | ||
| try: | ||
| with open("output.txt") as f: | ||
| for line in f: | ||
| prompts.append(line.strip()) | ||
| print(f"Loaded {len(prompts)} prompts from output.txt") | ||
| except FileNotFoundError: | ||
| print("Error: output.txt file not found") | ||
| exit(-1) | ||
|
|
||
| sampling_params = SamplingParams(temperature=0, top_p=0.95, max_tokens=10) | ||
|
|
||
| llm = LLM( | ||
| model="meta-llama/llama-3.1-8b-instruct", | ||
| enforce_eager=True, | ||
| gpu_memory_utilization=0.8, | ||
| kv_transfer_config=KVTransferConfig.from_cli( | ||
| '{"kv_connector":"SharedStorageConnector","kv_role":"kv_both",' | ||
| '"kv_connector_extra_config": {"shared_storage_path": "local_storage"}}' | ||
| )) #, max_model_len=2048, max_num_batched_tokens=2048) | ||
|
|
||
| # 1ST generation (prefill instance) | ||
| outputs = llm.generate(prompts, sampling_params) | ||
|
|
||
| new_prompts = [] | ||
|
||
| for output in outputs: | ||
| prompt = output.prompt | ||
| generated_text = output.outputs[0].text | ||
| new_prompts.append(prompt + generated_text) | ||
| print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") | ||
42 changes: 42 additions & 0 deletions
42
examples/offline_inference/disaggrated-prefill-v1/prefill_example.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| from vllm import LLM, SamplingParams | ||
| from vllm.config import KVTransferConfig | ||
|
|
||
| context = "Hi " * 1000 | ||
| context2 = "Hey " * 500 | ||
| prompts = [ | ||
| context + "Hello, my name is", | ||
| context + "The capital of France is", | ||
| context2 + "Your name is", | ||
| context2 + "The capital of China is", | ||
| ] | ||
|
|
||
| sampling_params = SamplingParams(temperature=0, top_p=0.95, max_tokens=1) | ||
|
|
||
| llm = LLM(model="meta-llama/llama-3.1-8b-instruct", | ||
| enforce_eager=True, | ||
| gpu_memory_utilization=0.8, | ||
| kv_transfer_config=KVTransferConfig.from_cli( | ||
| '{"kv_connector":"SharedStorageConnector","kv_role":"kv_both", ' | ||
| '"kv_extra_config": {"shared_storage_path": "local_storage"}}') | ||
|
||
| ) #, max_model_len=2048, max_num_batched_tokens=2048) | ||
|
|
||
| # 1ST generation (prefill instance) | ||
| outputs = llm.generate( | ||
| prompts, | ||
| sampling_params, | ||
| ) | ||
|
|
||
| new_prompts = [] | ||
| for output in outputs: | ||
| prompt = output.prompt | ||
| generated_text = output.outputs[0].text | ||
| new_prompts.append(prompt + generated_text) | ||
| print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") | ||
|
|
||
| # Write new_prompts to output.txt | ||
| with open("output.txt", "w") as f: | ||
| for prompt in new_prompts: | ||
| f.write(prompt + "\n") | ||
| print(f"Saved {len(new_prompts)} prompts to output.txt") | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| rm -rf local_storage/ | ||
|
|
||
| VLLM_ENABLE_V1_MULTIPROCESSING=0 CUDA_VISIBLE_DEVICES=1 python3 prefill_example.py | ||
| VLLM_ENABLE_V1_MULTIPROCESSING=0 CUDA_VISIBLE_DEVICES=1 python3 decode_example.py |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # yapf: disable | ||
| from vllm.distributed.kv_transfer.kv_connector.v1.base import ( | ||
| KVConnectorBase_V1, KVConnectorRole) | ||
|
|
||
| # yapf: enable | ||
|
|
||
| __all__ = [ | ||
| "KVConnectorRole", | ||
| "KVConnectorBase_V1", | ||
| ] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertgshaw2-redhat Just realized there is a typo in this folder's name
disaggrated-prefill-v1->disaggregated-prefill-v1