-
Notifications
You must be signed in to change notification settings - Fork 645
feat: support worker with dp rank in rust #1392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pub struct WorkerSelectionResult<T: WorkerGeneral> { | ||
/// The worker id of the selected worker | ||
pub worker_id: i64, | ||
pub worker: T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a first pass, I expected to see something simpler like this:
pub struct WorkerSelectionResult {
/// The worker id of the selected worker
pub worker_id: i64,
// The data parallel attention rank of the selected worker, if applicable
pub dp_rank: Option<u32>,
...
Rather than a template specialization being updated everywhere for WorkerSelectionResult<WorkerDp>
, KvHitRateEvent<WorkerDp>
, etc.
I figure if we add more and more independent field for different specializations, then maybe we go the template/generics route or re-think it a bit then.
What do others think? @ryanolson @paulhendricks @GuanLuo @alec-flowers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with the generics route because I don't think it made much sense to "restrict" the Indexer down to a specific worker type (be it just id, or id with dp rank, or any extension in the future). And I believe this generalization is zero-cost.
But agree that it adds some bloat. Open to hearing what others think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the change that worker_id is generic with respect to the trait, but I'm still need to fully understand why we need both.
my expectation is that workers that belong to a strong scaling cohort, e.g. an application with multiple workers performing dp/tp/pp parallelism, each worker would know it's logical "rank" in that cohort, but in turn, would know the mapping of all the dynamo worker_ids/lease_ids for each of the other ranks in the cohort.
My assumption is that each dp parallel rank used for attention would be in it's own worker process, have it's own dynamo runtime and have it's own worker_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this with the vLLM team
vllm-project/vllm#17546
https://docs.google.com/document/d/10jhCNxJYvsUhtMtiMAaW2MxU5LU8HVje2pGDnj49gH4/edit?pli=1&tab=t.0#heading=h.wsk0hlrf3cp2
This is currently the design of their driver workers and engine core setup. Doing something like scaleout 2 would enable exactly what you have in mind @ryanolson for DP.

However its currently scaleout 4 that is implmented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the above case the Launcher would be dynamo and we can drop the APIServer.
/// Represents a single cache event with an ID and associated data. | ||
#[derive(Serialize, Deserialize, Debug, Clone)] | ||
pub struct KvCacheEventWithDp { | ||
pub kv_cache_event: KvCacheEvent, | ||
pub dp_rank: Option<DpRank>, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add dp_rank
to KVCacheEvent
? and avoid creating wrapper on top of wrapper for KVCacheData
? My concern comes from potential extensibility. This one based on the name is very DP oriented. What if in future we want to add another field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this question is also a +1 to Ryan M's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I need to have a thought about this. Since this is tied directly to the publishers, we needed to include dp_rank
directly. I do want to keep the original KVCacheEvent
to keep it atomic (even though it's already super nested).
But it's probably a good idea to rename KVCacheEventWithDp
to something else for future-proofing, do you have a name in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed because we need to route requests first to the leader then to the dp rank?
|
||
// Cannot add DeserializedOwned otherwise compiler will complain | ||
pub trait WorkerGeneral: | ||
Hash + Eq + Debug + Clone + Send + Sync + Default + 'static + Serialize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for<'de> Deserialize<'de>
?
not sure if this would be sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason the compiler would complain:
type annotations needed: cannot satisfy T: Deserialize<'_>
on the Deserialize
derivation of any generics (e.g. RouterResponse
) using type T
pub struct WorkerSelectionResult<T: WorkerGeneral> { | ||
/// The worker id of the selected worker | ||
pub worker_id: i64, | ||
pub worker: T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the change that worker_id is generic with respect to the trait, but I'm still need to fully understand why we need both.
my expectation is that workers that belong to a strong scaling cohort, e.g. an application with multiple workers performing dp/tp/pp parallelism, each worker would know it's logical "rank" in that cohort, but in turn, would know the mapping of all the dynamo worker_ids/lease_ids for each of the other ranks in the cohort.
My assumption is that each dp parallel rank used for attention would be in it's own worker process, have it's own dynamo runtime and have it's own worker_id.
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Overview:
Taking just the rust bits from #1285
Summary by CodeRabbit
New Features
dp_rank
) alongside worker IDs throughout routing, event publishing, metrics, and Python bindings.dp_rank
information.Improvements
Bug Fixes
Documentation
Breaking Changes