-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
feat: Enable engine-level arguments with speculators models #25250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enable engine-level arguments with speculators models #25250
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to match vllm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me, just a nit
# Build base vLLM config | ||
# Build base vLLM speculative configuration | ||
vllm_config = { | ||
"method": config_dict.get("speculators_model_type"), | ||
"num_lookahead_tokens": num_lookahead_tokens, | ||
"num_speculative_tokens": num_speculative_tokens, | ||
"target_model": spec_config.get("verifier")["name_or_path"] | ||
} | ||
vllm_config.update(config_dict["transformer_layer_config"]) | ||
|
||
# Merge transformer layer configuration if present | ||
transformer_config = config_dict.get("transformer_layer_config", {}) | ||
vllm_config.update(transformer_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we should validate that this is a valid SpeculativeConfig after construction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can validate the engine args level, let me add that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On taking a deeper look it will be non-trivial and sort of hacky to validate that here, since create_speculative_config
methods adds target model config and other things before initializing the SpeculativeConfig, I think its fine to fail at that level for now?
This pull request has merge conflicts that must be resolved before it can be |
@@ -19,7 +19,6 @@ def test_llama(vllm_runner, example_prompts, model_path, monkeypatch): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we can add a better non-smoke test to this PR to prevent something like this in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this test itself to check for speculative config and other things, they should be fine now, and would catch errors like these in the future!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense to me. Thanks for this.
@rahul-tuli there is merge conflicts? can you address this as well? |
d8df1d1
to
b91f7df
Compare
This commit enables users to combine engine-level arguments (like --tensor-parallel-size, --seed, --max-model-len) with speculators models using simplified command syntax. Changes: - Move speculators detection from ModelConfig to EngineArgs for earlier processing - Refactor speculators config extraction with improved function naming: - convert_speculators_to_vllm → build_vllm_speculative_config - get_vllm_config_dict → extract_vllm_speculative_config - maybe_override_with_speculators_target_model → maybe_override_with_speculators - Enhance test coverage to verify speculative config initialization - Add comprehensive documentation and error handling - Remove debug logging from production code - Apply consistent code formatting per project standards Users can now use simplified syntax like: vllm serve --seed 42 --tensor-parallel-size 4 "speculators-model-name" Instead of the verbose explicit configuration. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Rahul Tuli <[email protected]>
aa6b3ec
to
125a424
Compare
- Replace corrupted config/__init__.py with clean version from main - Combine test_llama and test_qwen into single parameterized test - Add descriptive test IDs: llama3-eagle3-speculator, qwen3-eagle3-speculator - Fix inconsistent property access and enhance test documentation - Verify speculative config initialization and text generation - Apply formatting fixes from pre-commit hooks Signed-off-by: Rahul Tuli <[email protected]>
125a424
to
8e10e4b
Compare
Addressed! @aarnphm |
…ject#25250) Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: Claude <[email protected]>
…ject#25250) Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: Claude <[email protected]>
…ject#25250) Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: Claude <[email protected]>
…ject#25250) Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: charlifu <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: yewentao256 <[email protected]>
…ject#25250) Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
…ject#25250) Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: Claude <[email protected]>
This PR fixes two critical issues with speculators models that were running without speculative decoding enabled, and enables users to combine engine-level arguments (like
--tensor-parallel-size
,--seed
,--max-model-len
) with speculators models using simplified command syntax.Problem
Previously, there were two issues with speculators models:
The speculators model detection was happening too late in the configuration pipeline (in
ModelConfig
), which meant:Solution
This PR fixes both issues:
vllm serve --seed 42 --tensor-parallel-size 4 "nm-testing/SpeculatorLlama3-1-8B-Eagle3-converted-0717-quantized"
The model will now correctly:
Implementation Details
Core Changes
ModelConfig
toEngineArgs
to ensure proper argument processingconvert_speculators_to_vllm
→build_vllm_speculative_config
extract_vllm_speculative_config
to add proper validation and algorithm level updatesmaybe_override_with_speculators_target_model
→maybe_override_with_speculators
Technical Implementation
speculators_config
EngineArgs.create_engine_config()
Usage Examples
Basic Serve Command
Test Request
Benefits
acceleration
Files Modified
vllm/engine/arg_utils.py
: Moved speculators detection logicvllm/transformers_utils/config.py
: Enhanced configuration resolutionvllm/transformers_utils/configs/speculators/base.py
: Improved function naming and documentationvllm/config/__init__.py
: Removed old detection logictests/speculative_decoding/speculators/test_eagle3.py
: Used parameterization, and expanded test to check for speculative config, they should catch future errors like these now!