-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support #24845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
tlrmchlsmth
merged 287 commits into
vllm-project:main
from
neuralmagic:lwilkinson/dbo-prefill
Sep 23, 2025
Merged
Changes from all commits
Commits
Show all changes
287 commits
Select commit
Hold shift + click to select a range
92e0cc7
format
SageMoore 44a595f
config format
SageMoore d4b502a
mla format
SageMoore 8332924
dp format
SageMoore 243eac5
forward context format
SageMoore d463976
pplx format
SageMoore e34e441
fa format
SageMoore 919eef9
temporarily remove enable_microbatching
SageMoore 2731e8c
temporarily remove enable_microbatching
SageMoore 18e7d6c
Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilk…
SageMoore 539c0c3
first round of fixes
SageMoore 5f4a501
more fixes
SageMoore e080e06
fix pplx a2a
SageMoore 2e3484c
debugging
SageMoore f8848bb
misc fixes. lm_eval still gets a wrong answer but it no longer hangs
SageMoore 8a75b3a
added support for ubatch padding. not working
SageMoore a8675b7
ubatch padding should work now
SageMoore a00dabc
more padding work. still gets the wrong answer
SageMoore 05ddc34
misc padding fixes
SageMoore 60499f6
padding is getting correctness but there are still some edgecases tri…
SageMoore e6e3407
fix ubatch padding to account for the case where the padding would re…
SageMoore 642bf2d
Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilk…
SageMoore ef3c01c
fix using the same buffer across ubatches
LucasWilkinson d682f5e
wip cudagraphs
SageMoore b74c731
more hacking
SageMoore 1d112d9
misc changes
SageMoore 0889f66
Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilk…
SageMoore ff2dd13
more fixes
SageMoore a4def24
setup deepepll for ubatching
SageMoore 930efd0
yields now work with deepep_ll
SageMoore 96c0c4e
added initial code for cuda graph capturing ubatches
SageMoore 97dbafa
fix correctness issue with full-cudagraphs + attn splitting
SageMoore 144b148
initial full cudagraphs support. normal runs are working. ubatching d…
SageMoore 44a2b34
add attention splitting to dummy runs
SageMoore e2ba707
factored out some of the context creation code along with misc commet…
SageMoore 0e2b4bd
more refactoring
SageMoore 54deb61
delete any notion of dummy_ubatch
SageMoore 78228a6
refactor a bunch of misc parameters into a UbatchMetadata class
SageMoore af68574
reintegrate full cudagraphs
SageMoore 4672c72
capture works replay does not
SageMoore d833982
random push
SageMoore 57d404b
misc
SageMoore f7a3ee0
Merge remote-tracking branch 'origin/main' into lwilkinson/attn-slicing
LucasWilkinson c0efbbb
misc changes
SageMoore 0767d98
fix data_parallel.py
SageMoore 0e499c4
first round of cleanups
SageMoore 3d833aa
cleanup
SageMoore 18f7bfb
ubatching fix
SageMoore ce3ef95
turn yields on for pplx
SageMoore be2e163
delete basic-ub.py
SageMoore 9b7edc0
cleanup data_parallel.py
SageMoore 0c03d15
cleanup config.py
SageMoore 3112714
cleanup logger.py
SageMoore 1ca6541
cleanup backends/utils.py
SageMoore fc562e2
cleanup gpu_worker.py
SageMoore a9d47e8
remove always_microbatch_if_enabled
SageMoore 631be12
refactoring pplx_prepare_finalize.py
SageMoore 6e2a3c0
minor changes
SageMoore 17a7cee
cleanup deepep ll
SageMoore 1d75a02
remove cudagraph logic from flashmla.py
SageMoore 7e2ff26
cleanup flashmla.py
SageMoore 2f3461a
cleanup flashmla.py
SageMoore 83caef8
cleanups for ubatching.py
SageMoore 7cc5a54
cleanup some of the should_ubatch logic
SageMoore 0056be2
less ARs
SageMoore f7b6e60
gpu_model_runner cleanup
SageMoore 510e839
more cleanup
SageMoore bb0645c
separate ubatch and normal runs
SageMoore 3a41a3d
cleanup
SageMoore 06cc133
cleanup
SageMoore 908e9f8
cleanup
SageMoore 10ca263
split some of the ubatching logic out of _run_model
SageMoore 82ae694
comments cleanup etc
SageMoore 1a0e711
_prepare_inputs cleanup
SageMoore 716b032
should_ubatch improvements
SageMoore dc1b6af
format
SageMoore bfa828f
format
SageMoore 462c6b0
remove some dummy_run logic
SageMoore 9033056
remove FA changes
SageMoore 376e7eb
minor change
SageMoore 9b5913e
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore b53450e
fix deep ep ll teardown
SageMoore 29a5ac1
remove previous fix
SageMoore 6d83b5e
cache comm stream
SageMoore 1ba3ae8
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore ee70ce0
added splitting
SageMoore b9ad5e4
misc merge fixes
SageMoore 1c41175
full cudagraphs
SageMoore 582d301
add support for splitting dispatch/combine deepep ll kernels
SageMoore ba17d95
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore e283eff
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore 4819bb8
fix eager mode
SageMoore 6b0c303
misc fixes
SageMoore 5bbfd95
add support for multiple builders in the model runner
SageMoore 2cf200c
remove debug logging
SageMoore 28e7c30
Fix pre-commit error
yewentao256 44ead56
fix set forward context error
yewentao256 e526b1c
fix num_tokens_across_dp sizing issue
SageMoore dd2a94f
fix assert error num_tokens_across_dp is None
yewentao256 5215c80
Merge commit '6e8d8c4afbddf725b34ef938616701869f5b3462' into sage/dbo…
yewentao256 9e16220
fix ubatch datatype issue
yewentao256 6d76bd0
revert kv connector fix
SageMoore 090f485
add support for cutlass mla full cudagraphs
SageMoore 143b09e
fix full cudagraphs for cutlass mla
SageMoore 32de502
fixed acc issue
yewentao256 fc0aca4
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore 31ba624
fix full cudagraphs for DBO
SageMoore 85ee541
Merge branch 'sage/dbo-full-cudagraphs' of https://github.com/neuralm…
SageMoore 9ac75b5
refactor model input slicing
SageMoore 34f0057
get eager mode ubatching working with UBatchWrapper
SageMoore ac6e221
get eager mode ubatching working with UBatchWrapper
SageMoore c8fdd62
get cudagraphs working with UBatchWrapper
SageMoore bca8aa9
cudagraphs should generally work now
SageMoore a35416e
gpu model runner cleanup
SageMoore 4126a89
misc cleanup
SageMoore 52fd4c1
cleanup
SageMoore 717163a
misc cleanup
SageMoore 197dad1
misc cleanup
SageMoore 7813e15
single alloc buffer
LucasWilkinson 968647a
use hooks for ll overlap; better perf; multinode fixes
LucasWilkinson 57423ee
ht support partially working
LucasWilkinson a3c2d62
got rid of one all reduce
SageMoore bff1216
only one AR remains
SageMoore ee00620
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore 6b9bda2
minor merge fix
SageMoore a5bda74
plumb microbatchin_token_threshold
SageMoore 8f63ba9
fix HT handle issue
yewentao256 d62286f
temp logging
SageMoore fe19b91
temp workaround for cudagraph dispatching bug
SageMoore 64457a2
minor log update
SageMoore a762835
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore 528be37
misc ubatch_wrapper updates
SageMoore e104dfa
comment updates
SageMoore df6ed10
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore 9390dcb
comment updates
SageMoore 6660171
lint
SageMoore 01c70b4
fix attn splitting in dummy run
SageMoore d464b9e
reenable torch.compile for grouped_topk
SageMoore 53f5071
remove deepep ht changes
SageMoore 76f3c96
remove pplx changes
SageMoore 307ecf0
mla cleanup
SageMoore 21b0f16
remove deepep ht changes
SageMoore c1c003f
mla cleanup
SageMoore aebacdc
pplx cleanup
SageMoore 4718a2d
pplx cleanup
SageMoore 0c54343
remove enable_async_comms
SageMoore b6d162f
padding bugfix
SageMoore 44124af
simplify a2a kernel dispatching
SageMoore 7427b2d
simplify ubatch padding
SageMoore 756d721
dp metadata refactor
SageMoore 9602070
debug cruft
SageMoore 32fb038
fix piecewise compilation in the ubatch wrapper
SageMoore e42c0e7
moves types around
SageMoore 10518bd
add check to assert we are using deepep_low_latency
SageMoore 9e1f1af
misc gpu model runner refactoring
SageMoore b2ed6c3
misc gpu model runner refactoring
SageMoore ba00047
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore 49cdc3d
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore ec9f13d
misc review comments
SageMoore 6d31123
move files
SageMoore 7077c02
use_ht
yewentao256 2def98d
add maybe_run_recv_hook to __exit__
SageMoore 25c6a14
fix interface bug
yewentao256 9e75f3e
prefill metadata splitting
LucasWilkinson 1258a9a
cleanup
LucasWilkinson ba7ac76
cleanup
LucasWilkinson 64d6b7d
fix unit tests
LucasWilkinson 1ffc32c
fix unit tests
LucasWilkinson 3f0f1e4
support dbo for HT
yewentao256 278d727
add deepgemm sms
yewentao256 e278416
fix boot
LucasWilkinson 11a3b47
wip
LucasWilkinson bb571b5
clean up
LucasWilkinson a38ed84
Merge remote-tracking branch 'nm/wye-dbo-full-cudagraph-ht' into lwil…
LucasWilkinson 9ac6f2b
wip; running bad accuracy
LucasWilkinson 9da3928
padding refactor (doesnt work)
SageMoore 6b6358a
padding fix
SageMoore 87d300e
remove old ubatch splitting code
SageMoore 46895f3
move splitting code into its own file
SageMoore 4114f5c
remove logging
SageMoore 2276ac6
remove context offset
SageMoore 178ec20
minor yield fix
SageMoore ef313e5
remove flash attention metadata
SageMoore 813ba08
modular kernel refactoring
SageMoore 9e08d5d
eagle fixes
SageMoore 1e3a145
second ubatch empty
SageMoore fc18cf4
misc review comments
SageMoore b99ea7d
comments
SageMoore 0d36d13
comments
SageMoore d3ec67b
comments
SageMoore ac8fbb7
fix acc issue
yewentao256 d69834c
merge sage dbo branch into this
yewentao256 a51e9a0
chill out warmup
LucasWilkinson 5b4a96c
working!
LucasWilkinson 517d3ad
Merge branch 'lwilkinson/ht-plain' into lwilkinson/dbo-prefill
LucasWilkinson 120569a
padding refactor
SageMoore 49ac0e7
clean-up
LucasWilkinson 0e479cb
padding refactor
SageMoore 880783e
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore 39b4780
fix ll accuracy
LucasWilkinson da40b45
fix accuracy
LucasWilkinson 2f65c8f
cleanup
LucasWilkinson 73848ab
fix cpu model runner
SageMoore 7b239fd
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore 9c6c6fd
lint
SageMoore ff75a86
lint
SageMoore fa30304
lint
SageMoore bbec31e
lint
SageMoore 9185ffc
lint
SageMoore 911dbe7
lint
SageMoore bce1898
lint
SageMoore 588e79a
minor typename change
SageMoore 52b94fd
Merge remote-tracking branch 'nm/sage/dbo-full-cudagraphs' into lwilk…
LucasWilkinson a3d9969
moe layer refactoring
SageMoore 1d135f5
cleanup
LucasWilkinson 92081eb
should_ubatch_across_dp refactor
SageMoore 462d035
config option name change
SageMoore 8c0afae
passing but accuracy drop
LucasWilkinson 4fba0fe
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…
SageMoore 71640a8
clean up SM controls
LucasWilkinson 191cf91
Merge remote-tracking branch 'nm/sage/dbo-full-cudagraphs' into lwilk…
LucasWilkinson b0d5a48
fix precommit
LucasWilkinson fccbd35
cleanup
LucasWilkinson eb650bf
Merge remote-tracking branch 'origin/main' into lwilkinson/dbo-prefill
LucasWilkinson ef8842d
fix merge issue
LucasWilkinson 0337654
clean-up
LucasWilkinson 34721ef
pre-commit fix
LucasWilkinson aeee2ba
minor cleanup
LucasWilkinson 991a840
more minor cleanup
LucasWilkinson bd04165
cleanup
LucasWilkinson 1f84203
add auto cg fallback
LucasWilkinson 2ad6282
Merge remote-tracking branch 'origin/main' into lwilkinson/dbo-prefill
LucasWilkinson aeb8a47
fix precommit
LucasWilkinson 23389a1
Merge remote-tracking branch 'origin/main' into lwilkinson/dbo-prefill
LucasWilkinson 3a1c628
review comments
LucasWilkinson 3a3383d
review comments
LucasWilkinson 47f2a3b
review comments
LucasWilkinson 10b9487
Merge remote-tracking branch 'origin/main' into lwilkinson/dbo-prefill
LucasWilkinson 670a8ce
fix deepgemm warmup
LucasWilkinson 52d324d
better HT schedule
LucasWilkinson 9a72580
fix assert
LucasWilkinson 68ebc6b
Merge remote-tracking branch 'origin/main' into lwilkinson/dbo-prefill
LucasWilkinson 8dd1bb0
Merge remote-tracking branch 'origin/main' into lwilkinson/dbo-prefill
LucasWilkinson 4781120
remove unrelated change
LucasWilkinson 99f6c06
Update vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finaliz…
LucasWilkinson 9416c69
review comments
LucasWilkinson 4e9e2ce
Merge branch 'main' into lwilkinson/dbo-prefill
tlrmchlsmth 356ddcb
fix pre-commit
LucasWilkinson 4c631db
Merge branch 'main' into lwilkinson/dbo-prefill
tlrmchlsmth f0bc569
precommit fix
LucasWilkinson File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -330,6 +330,8 @@ class EngineArgs: | |
enable_dbo: bool = ParallelConfig.enable_dbo | ||
dbo_decode_token_threshold: int = \ | ||
ParallelConfig.dbo_decode_token_threshold | ||
dbo_prefill_token_threshold: int = \ | ||
ParallelConfig.dbo_prefill_token_threshold | ||
eplb_config: EPLBConfig = get_field(ParallelConfig, "eplb_config") | ||
enable_eplb: bool = ParallelConfig.enable_eplb | ||
expert_placement_strategy: ExpertPlacementStrategy = \ | ||
|
@@ -698,6 +700,9 @@ def add_cli_args(parser: FlexibleArgumentParser) -> FlexibleArgumentParser: | |
parallel_group.add_argument( | ||
"--dbo-decode-token-threshold", | ||
**parallel_kwargs["dbo_decode_token_threshold"]) | ||
parallel_group.add_argument( | ||
"--dbo-prefill-token-threshold", | ||
**parallel_kwargs["dbo_prefill_token_threshold"]) | ||
Comment on lines
+703
to
+705
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add comments for the new arg |
||
parallel_group.add_argument("--enable-eplb", | ||
**parallel_kwargs["enable_eplb"]) | ||
parallel_group.add_argument("--eplb-config", | ||
|
@@ -1316,6 +1321,7 @@ def create_engine_config( | |
enable_expert_parallel=self.enable_expert_parallel, | ||
enable_dbo=self.enable_dbo, | ||
dbo_decode_token_threshold=self.dbo_decode_token_threshold, | ||
dbo_prefill_token_threshold=self.dbo_prefill_token_threshold, | ||
enable_eplb=self.enable_eplb, | ||
eplb_config=self.eplb_config, | ||
expert_placement_strategy=self.expert_placement_strategy, | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would -1 be better?