Skip to content

Conversation

PeaBrane
Copy link
Contributor

@PeaBrane PeaBrane commented Jun 5, 2025

Overview:

Taking just the rust bits from #1285

Summary by CodeRabbit

  • New Features

    • Added support for tracking and exposing data parallel rank (dp_rank) alongside worker IDs throughout routing, event publishing, metrics, and Python bindings.
    • Introduced a new Python class and Rust struct representing workers with optional data parallel rank.
    • Enhanced event and metrics data structures to include dp_rank information.
    • Added the ability to create multiple event publishers per data parallel rank.
  • Improvements

    • Refactored internal routing, scheduling, and event handling to use richer worker identifiers, improving type safety and flexibility.
    • Improved logging and error messages with more detailed worker and rank context.
  • Bug Fixes

    • Prevented creation of empty endpoint lists in metrics aggregation.
  • Documentation

    • Updated method docstrings and comments for clarity on new parameters and behaviors.
  • Breaking Changes

    • Several public APIs now use worker identifiers with optional data parallel rank instead of simple IDs. Existing integrations may require updates.

Copy link

copy-pr-bot bot commented Jun 5, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the feat label Jun 5, 2025
@PeaBrane PeaBrane marked this pull request as ready for review June 5, 2025 00:30
Comment on lines +63 to +65
pub struct WorkerSelectionResult<T: WorkerGeneral> {
/// The worker id of the selected worker
pub worker_id: i64,
pub worker: T,
Copy link
Contributor

@rmccorm4 rmccorm4 Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a first pass, I expected to see something simpler like this:

pub struct WorkerSelectionResult {
    /// The worker id of the selected worker
    pub worker_id: i64,
    // The data parallel attention rank of the selected worker, if applicable
    pub dp_rank: Option<u32>,
    ...

Rather than a template specialization being updated everywhere for WorkerSelectionResult<WorkerDp>, KvHitRateEvent<WorkerDp>, etc.

I figure if we add more and more independent field for different specializations, then maybe we go the template/generics route or re-think it a bit then.

What do others think? @ryanolson @paulhendricks @GuanLuo @alec-flowers

Copy link
Contributor Author

@PeaBrane PeaBrane Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the generics route because I don't think it made much sense to "restrict" the Indexer down to a specific worker type (be it just id, or id with dp rank, or any extension in the future). And I believe this generalization is zero-cost.

But agree that it adds some bloat. Open to hearing what others think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the change that worker_id is generic with respect to the trait, but I'm still need to fully understand why we need both.

my expectation is that workers that belong to a strong scaling cohort, e.g. an application with multiple workers performing dp/tp/pp parallelism, each worker would know it's logical "rank" in that cohort, but in turn, would know the mapping of all the dynamo worker_ids/lease_ids for each of the other ranks in the cohort.

My assumption is that each dp parallel rank used for attention would be in it's own worker process, have it's own dynamo runtime and have it's own worker_id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this with the vLLM team
vllm-project/vllm#17546
https://docs.google.com/document/d/10jhCNxJYvsUhtMtiMAaW2MxU5LU8HVje2pGDnj49gH4/edit?pli=1&tab=t.0#heading=h.wsk0hlrf3cp2

This is currently the design of their driver workers and engine core setup. Doing something like scaleout 2 would enable exactly what you have in mind @ryanolson for DP.

image

However its currently scaleout 4 that is implmented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above case the Launcher would be dynamo and we can drop the APIServer.

@rmccorm4 rmccorm4 mentioned this pull request Jun 5, 2025
Comment on lines 46 to 52
/// Represents a single cache event with an ID and associated data.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct KvCacheEventWithDp {
pub kv_cache_event: KvCacheEvent,
pub dp_rank: Option<DpRank>,
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add dp_rank to KVCacheEvent ? and avoid creating wrapper on top of wrapper for KVCacheData? My concern comes from potential extensibility. This one based on the name is very DP oriented. What if in future we want to add another field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this question is also a +1 to Ryan M's

Copy link
Contributor Author

@PeaBrane PeaBrane Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I need to have a thought about this. Since this is tied directly to the publishers, we needed to include dp_rank directly. I do want to keep the original KVCacheEvent to keep it atomic (even though it's already super nested).

But it's probably a good idea to rename KVCacheEventWithDp to something else for future-proofing, do you have a name in mind?

Copy link
Contributor

@ryanolson ryanolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed because we need to route requests first to the leader then to the dp rank?


// Cannot add DeserializedOwned otherwise compiler will complain
pub trait WorkerGeneral:
Hash + Eq + Debug + Clone + Send + Sync + Default + 'static + Serialize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for<'de> Deserialize<'de>?

not sure if this would be sufficient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason the compiler would complain:
type annotations needed: cannot satisfy T: Deserialize<'_>
on the Deserialize derivation of any generics (e.g. RouterResponse) using type T

Comment on lines +63 to +65
pub struct WorkerSelectionResult<T: WorkerGeneral> {
/// The worker id of the selected worker
pub worker_id: i64,
pub worker: T,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the change that worker_id is generic with respect to the trait, but I'm still need to fully understand why we need both.

my expectation is that workers that belong to a strong scaling cohort, e.g. an application with multiple workers performing dp/tp/pp parallelism, each worker would know it's logical "rank" in that cohort, but in turn, would know the mapping of all the dynamo worker_ids/lease_ids for each of the other ranks in the cohort.

My assumption is that each dp parallel rank used for attention would be in it's own worker process, have it's own dynamo runtime and have it's own worker_id.

Copy link

github-actions bot commented Jul 7, 2025

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jul 7, 2025
@PeaBrane PeaBrane closed this Jul 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants