feat: support worker with dp rank in rust #1392

PeaBrane · 2025-06-05T00:30:20Z

Overview:

Taking just the rust bits from #1285

Summary by CodeRabbit

New Features
- Added support for tracking and exposing data parallel rank (dp_rank) alongside worker IDs throughout routing, event publishing, metrics, and Python bindings.
- Introduced a new Python class and Rust struct representing workers with optional data parallel rank.
- Enhanced event and metrics data structures to include dp_rank information.
- Added the ability to create multiple event publishers per data parallel rank.
Improvements
- Refactored internal routing, scheduling, and event handling to use richer worker identifiers, improving type safety and flexibility.
- Improved logging and error messages with more detailed worker and rank context.
Bug Fixes
- Prevented creation of empty endpoint lists in metrics aggregation.
Documentation
- Updated method docstrings and comments for clarity on new parameters and behaviors.
Breaking Changes
- Several public APIs now use worker identifiers with optional data parallel rank instead of simple IDs. Existing integrations may require updates.

…dings

copy-pr-bot · 2025-06-05T00:30:24Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rmccorm4 · 2025-06-05T16:15:31Z

lib/llm/src/kv_router/protocols.rs

+pub struct WorkerSelectionResult<T: WorkerGeneral> {
    /// The worker id of the selected worker
-    pub worker_id: i64,
+    pub worker: T,


For a first pass, I expected to see something simpler like this:

pub struct WorkerSelectionResult { /// The worker id of the selected worker pub worker_id: i64, // The data parallel attention rank of the selected worker, if applicable pub dp_rank: Option<u32>, ...

Rather than a template specialization being updated everywhere for WorkerSelectionResult<WorkerDp>, KvHitRateEvent<WorkerDp>, etc.

I figure if we add more and more independent field for different specializations, then maybe we go the template/generics route or re-think it a bit then.

What do others think? @ryanolson @paulhendricks @GuanLuo @alec-flowers

I went with the generics route because I don't think it made much sense to "restrict" the Indexer down to a specific worker type (be it just id, or id with dp rank, or any extension in the future). And I believe this generalization is zero-cost.

But agree that it adds some bloat. Open to hearing what others think.

I do like the change that worker_id is generic with respect to the trait, but I'm still need to fully understand why we need both.

my expectation is that workers that belong to a strong scaling cohort, e.g. an application with multiple workers performing dp/tp/pp parallelism, each worker would know it's logical "rank" in that cohort, but in turn, would know the mapping of all the dynamo worker_ids/lease_ids for each of the other ranks in the cohort.

My assumption is that each dp parallel rank used for attention would be in it's own worker process, have it's own dynamo runtime and have it's own worker_id.

We discussed this with the vLLM team
vllm-project/vllm#17546
https://docs.google.com/document/d/10jhCNxJYvsUhtMtiMAaW2MxU5LU8HVje2pGDnj49gH4/edit?pli=1&tab=t.0#heading=h.wsk0hlrf3cp2

This is currently the design of their driver workers and engine core setup. Doing something like scaleout 2 would enable exactly what you have in mind @ryanolson for DP.

However its currently scaleout 4 that is implmented.

In the above case the Launcher would be dynamo and we can drop the APIServer.

oandreeva-nv · 2025-06-05T18:27:01Z

lib/llm/src/kv_router/publisher.rs

+/// Represents a single cache event with an ID and associated data.
+#[derive(Serialize, Deserialize, Debug, Clone)]
+pub struct KvCacheEventWithDp {
+    pub kv_cache_event: KvCacheEvent,
+    pub dp_rank: Option<DpRank>,
+}
+


Is it possible to add dp_rank to KVCacheEvent ? and avoid creating wrapper on top of wrapper for KVCacheData? My concern comes from potential extensibility. This one based on the name is very DP oriented. What if in future we want to add another field?

I guess this question is also a +1 to Ryan M's

hmm, I need to have a thought about this. Since this is tied directly to the publishers, we needed to include dp_rank directly. I do want to keep the original KVCacheEvent to keep it atomic (even though it's already super nested).

But it's probably a good idea to rename KVCacheEventWithDp to something else for future-proofing, do you have a name in mind?

ryanolson

is this needed because we need to route requests first to the leader then to the dp rank?

ryanolson · 2025-06-05T17:52:53Z

lib/llm/src/kv_router/protocols.rs

+
+// Cannot add DeserializedOwned otherwise compiler will complain
+pub trait WorkerGeneral:
+    Hash + Eq + Debug + Clone + Send + Sync + Default + 'static + Serialize


for<'de> Deserialize<'de>?

not sure if this would be sufficient

for some reason the compiler would complain:
type annotations needed: cannot satisfy T: Deserialize<'_>
on the Deserialize derivation of any generics (e.g. RouterResponse) using type T

ryanolson · 2025-06-05T18:03:23Z

lib/llm/src/kv_router/protocols.rs

+pub struct WorkerSelectionResult<T: WorkerGeneral> {
    /// The worker id of the selected worker
-    pub worker_id: i64,
+    pub worker: T,


I do like the change that worker_id is generic with respect to the trait, but I'm still need to fully understand why we need both.

my expectation is that workers that belong to a strong scaling cohort, e.g. an application with multiple workers performing dp/tp/pp parallelism, each worker would know it's logical "rank" in that cohort, but in turn, would know the mapping of all the dynamo worker_ids/lease_ids for each of the other ranks in the cohort.

My assumption is that each dp parallel rank used for attention would be in it's own worker process, have it's own dynamo runtime and have it's own worker_id.

lib/llm/src/kv_router/publisher.rs

github-actions · 2025-07-07T09:37:10Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

PeaBrane and others added 26 commits May 29, 2025 20:23

some prelim cleanups

86c79ba

router can route to dp ranks

6bee243

make the bunny hoppy

dab052c

Merge remote-tracking branch 'origin/main' into rupei/router-general

be6900e

Merge remote-tracking branch 'origin/main' into rupei/router-general

25e1291

new struct combining worker_id with dp_rank, dirty commit, breaks bin…

34e5c5b

…dings

binding works

2cef74c

dummy c binding note

10d3326

add_class WorkerWithDpRank

4483c68

renames + comments + fmt

263c12d

allow suffix for dp_rank identification

65ea6b5

WIP: fix fn dp_rank, add TODO's

a2ef896

refactor: fix bugs, kv publishing working

e80d66c

fix panicing metric thread issue

7a733bd

remove verbose log

1bddc8e

update v1 worker

ee283cc

put dp_rank in PreprocessedRequest

183a8fe

new agg config

be7f951

updated comments

e1011d8

update v1 example

5bf4fae

final touches for it working with dp

d6ded6c

Merge branch 'main' into rupei/router-general

61b94ac

fix cost function trace

9335efe

fmt

931b837

Merge branch 'main' into rupei/router-general

2a72271

revert vllm_v1 examples back to main

b1351d5

pull-request-size bot added the size/XL label Jun 5, 2025

github-actions bot added the feat label Jun 5, 2025

PeaBrane marked this pull request as ready for review June 5, 2025 00:30

PeaBrane added 2 commits June 5, 2025 01:14

fix test_event_handler again

44fda40

some comments in test

bef8a98

rmccorm4 reviewed Jun 5, 2025

View reviewed changes

rmccorm4 mentioned this pull request Jun 5, 2025

feat: Disagg example #1401

Closed

oandreeva-nv reviewed Jun 5, 2025

View reviewed changes

ryanolson reviewed Jun 5, 2025

View reviewed changes

PeaBrane added 2 commits June 5, 2025 12:35

skip serializing if None

8aaef82

Merge branch 'main' into rupei/rust-worker-dp

9ee3b32

github-actions bot added the Stale label Jul 7, 2025

coderabbitai bot mentioned this pull request Jul 8, 2025

fix: Enable NIXL kv transfer in vLLM patch for TP > 8 #1809

Closed

PeaBrane closed this Jul 12, 2025

ishandhanani mentioned this pull request Oct 2, 2025

[FEATURE]: DP rank routing #3274

Open

feat: support worker with dp rank in rust #1392

feat: support worker with dp rank in rust #1392

Uh oh!

Conversation

PeaBrane commented Jun 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Jun 5, 2025

Uh oh!

rmccorm4 Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PeaBrane Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PeaBrane Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanolson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

PeaBrane commented Jun 5, 2025 •

edited by coderabbitai bot

Loading

rmccorm4 Jun 5, 2025 •

edited

Loading

PeaBrane Jun 5, 2025 •

edited

Loading

PeaBrane Jun 5, 2025 •

edited

Loading