-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[Model] Jamba support #4115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[Model] Jamba support #4115
Changes from 103 commits
Commits
Show all changes
117 commits
Select commit
Hold shift + click to select a range
337f67a
Merged in jurassic-2.5 (pull request #1)
ErezSC42 0330e14
Merged in jamba-3 (pull request #4)
07cc899
Jamba mamba (#3)
mzusman 6d336f6
Cuda graph (#5)
mzusman 00bce1f
dtype (#6)
mzusman 39c27b7
N support (#8)
mzusman 7c75868
Tensor parallelism (#7)
mzusman 30e6dcd
After merge fixes
5c0efdc
Clean up
19f11f3
Add release mamba cache to executor_base
1fb817a
Add jamba modifications
30ae4a1
Add minimun 1 attention layer
7bd9c0a
More fixes
d5ac8e8
Delete mamba cache
60b49b5
Jamba padding to the left
c583fe8
Clean up
c951b7d
Add import
da6d0f2
Another clean up
eb79923
Align to main
919edba
Fix reduce
4668566
Another fix
11a0737
Black format for jamba
7e3415e
Formatting
adbd2ae
Formatting with format.sh
6daf2a2
Adding to docs and more
7ee927b
Add to readme
87fa299
Adding comments for prefill mamba
8bca3b6
Formating
b421877
Remove mamba-ssm and conv1d from the build system requirements
d9c3319
Remove autoconfig for jamba
1c0fad8
Move get_attention_num_layers to model_config
e831cfc
Merge branch 'main' of https://github.com/vllm-project/vllm into jamb…
1033904
Fix in model config
2b93182
Formatting
b2f86f8
Add layers_block_type_support to model config
7061df7
Update Jamba to support changes from main
054faf1
Take Jamba config off since its now in transformers
fb3fc83
Take jamba config off
6d8765d
Format
10896ae
Refactor the model runner a little , make it more readable and chage
d1dc26f
rename release mamba to release seqlen agnostic
07c8cd2
Move requirements of mamba to its own requirements
5c11285
Remove mamba metadata since its mamba specific
2bb3360
Align with master
a235c44
Change comment
af7a4ac
(1) implement contains_seqlen_agnostic_layers with use of self.get_nu…
tomeras91 988718e
Jamba official hf (#14)
tomeras91 49ce3df
fixes missed in merge with official Jamba HF format
tomeras91 4fa065f
Merge with main
zhuohan123 7add09a
fix merge error
zhuohan123 14fbab5
Fix bug where seqlen agnostic cache wasn't copied for non driver workers
e3dec15
WIP - encapsulate Jamba cache managemnt inside the modeling file
92778c4
Cleanup
db36427
Typos and cleanup
7f6edfc
Another typo
ee5f058
Keep the finished requests ids after in the scheduler after each step…
2d42367
Cleanup
6a6378c
clean up requests after profile
1a8e2f9
Update mamba requirements
eb89987
Renaming
1cb8c1c
Renaming
feca5d5
Rename and docs
72c31cc
Format
ddeb689
Add mamba to Dockerfile
5d5a3be
Mamba disable prompt batching
85715fe
Format
84aa88f
Merge branch 'gh-main' into jamba-support-pr
8c6d82d
Fix jamba bug
628eec7
Renaming
30030ce
WIP - Merge with main adaptations
3fba9bc
Fix
45f3d96
Formating
794f1c3
deploy the finihsed request ids inside the modelinputs instead of worker
33eb405
fix
25c03e7
Renaming
94d40a8
Format
976166f
Typing and format
8181821
Cleanup
4fdc35b
Remove requirements-common and cuda from requirements-mamba
aadeca2
Fix
fee775e
set finished requests ids as none on default
668f3d9
get attr to get num hidden layers
10a44dc
Add jamba test
cd9ba35
Ignore jamba test in cpu
6df4f69
Cleanup
75dd84e
Format and rename
577f678
Format
7bb332e
change num_layers to num_attention_layers and add comment
c051758
Extended the finished reqeusts ids comment
b6dc237
Format and make the jamba code more readable, adding comments and
24b4bf2
Merge branch 'gh-main' into jamba-support-pr
b0b0836
Format
e52e4d7
Resolve conflicts and format
b4d49e0
Add finished requests ids to the prepare model spec decoding
68e27de
Format
670ff3a
Test cleanup
b7e31e3
Add message to test
571f63d
Add docstring in vllm/config.py
49da326
rename flush to get_and_reset
688732e
Add comments
4a6b170
Change to private and check finished through all of the queue
2047a91
CI
f2c407f
Merge branch 'gh-main' into jamba-support-pr
5d932a4
Pipeline Parallelism
3c15001
Make scheduler use pipeline parallel size directly
1ff2cdb
Change ABC defaults for prepare_model_input
548f4e8
Add basic comm ops tests with TP and PP.
5a4b323
Fix phi3v for test
c92257c
Address Nick nits and fix CUDAGraph correctness
60bb1a7
Merge branch 'pipeline-parallel' into jamba-support-pr
2ea2b80
Merge branch 'gh-main' into jamba-support-pr
10d8f3c
Formating and fixing llm engine
1331a8f
Align with main and format
21c92b4
Fix bug
726ccad
Format
4b6a491
Add intermediate tensors
da5d94a
Format
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # Mamba dependencies | ||
| mamba-ssm>=1.2.2 | ||
| causal-conv1d>=1.2.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| import pytest | ||
|
|
||
| MODELS = ["ai21labs/Jamba-tiny-random"] | ||
|
|
||
|
|
||
| @pytest.mark.parametrize("model", MODELS) | ||
| @pytest.mark.parametrize("dtype", ["float"]) | ||
| @pytest.mark.parametrize("max_tokens", [20]) | ||
| def test_models( | ||
| hf_runner, | ||
| vllm_runner, | ||
| example_prompts, | ||
| model: str, | ||
| dtype: str, | ||
| max_tokens: int, | ||
| ) -> None: | ||
| # To pass the small model tests, we need full precision. | ||
| assert dtype == "float" | ||
mzusman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| with hf_runner(model, dtype=dtype) as hf_model: | ||
| hf_outputs = hf_model.generate_greedy(example_prompts, max_tokens) | ||
|
|
||
| with vllm_runner(model, dtype=dtype) as vllm_model: | ||
| vllm_outputs = vllm_model.generate_greedy(example_prompts, max_tokens) | ||
|
|
||
| for i in range(len(example_prompts)): | ||
| hf_output_ids, hf_output_str = hf_outputs[i] | ||
| vllm_output_ids, vllm_output_str = vllm_outputs[i] | ||
| assert hf_output_str == vllm_output_str, ( | ||
| f"Test{i}:\nHF: {hf_output_str!r}\nvLLM: {vllm_output_str!r}") | ||
| assert hf_output_ids == vllm_output_ids, ( | ||
| f"Test{i}:\nHF: {hf_output_ids}\nvLLM: {vllm_output_ids}") | ||
|
|
||
|
|
||
| @pytest.mark.parametrize("model", MODELS) | ||
| @pytest.mark.parametrize("dtype", ["float"]) | ||
| def test_state_cleanup( | ||
| vllm_runner, | ||
| model: str, | ||
| dtype: str, | ||
| example_prompts, | ||
| ) -> None: | ||
| # This test is for verifying that the Jamba state is cleaned up between | ||
| # steps, If its not cleaned, an error would be expected. | ||
| try: | ||
| with vllm_runner(model, dtype=dtype) as vllm_model: | ||
| for _ in range(10): | ||
| vllm_model.generate_greedy([example_prompts[0]] * 100, 1) | ||
| except ValueError: | ||
| pytest.fail("Jamba inner state wasn't cleaned up between states, " | ||
| "could be related to finished_requests_ids") | ||
|
|
||
|
|
||
| @pytest.mark.parametrize("model", MODELS) | ||
| @pytest.mark.parametrize("dtype", ["float"]) | ||
| def test_model_print( | ||
| vllm_runner, | ||
| model: str, | ||
| dtype: str, | ||
| ) -> None: | ||
| with vllm_runner(model, dtype=dtype) as vllm_model: | ||
| # This test is for verifying whether the model's extra_repr | ||
| # can be printed correctly. | ||
| print(vllm_model.model.llm_engine.model_executor.driver_worker. | ||
| model_runner.model) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.