-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Misc][Metrics] expose requests preemptions in logger #25303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary: Currently when no new blocks available from each step, we already record this as [request events](https://fburl.com/code/rsiolx07) and set it back to engine client by EngineCoreResponses which later got [aggregated](https://fburl.com/code/82r3x1lw) in the [iteration stats](https://fburl.com/code/lw96wgom). In this diff, we just expose this to ODS via MetaStatLoggerV1 thus we get the counter exposed in the background. The reason we want this counter is to measure num requests preemptions when kv cache is saturated. Test Plan: run locally, saturate cache usage to 100%, we are able to see "llm.vllm.request.preemptions" popped up {F1982066617} Differential Revision: D82650207
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to expose request preemption counts by tracking them in LoggingStatLogger
. While the counter num_preempted_reqs
is correctly added and updated, a critical issue exists where this value is reset within the log
method before it is ever used. This bug prevents the preemption count from being exposed, defeating the purpose of the change. My review includes a comment detailing this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need a better name for this PR, and add some unittest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix links in description to be public. also let's see tests to be added in https://github.com/vllm-project/vllm/blob/9607d5eb449711b349d4c2bee0a9c94afcc7ed14/tests/v1/metrics/test_engine_logger_apis.py
vllm-project#25337) Signed-off-by: Roger Wang <[email protected]>
…roject#25339) Signed-off-by: simondanielsson <[email protected]>
…loaded (vllm-project#25341) Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]>
…tor (vllm-project#25334) Signed-off-by: Woosuk Kwon <[email protected]>
…ject#25250) Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: Claude <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
…vllm-project#25347) Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Debolina Roy <[email protected]>
Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]>
Signed-off-by: David Chen <[email protected]>
…vllm-project#22002) Signed-off-by: wangzi <[email protected]> Signed-off-by: David Chen <[email protected]> Co-authored-by: wangzi <[email protected]> Co-authored-by: Chauncey <[email protected]>
Signed-off-by: Juechen Liu <[email protected]>
Addressed comments, thanks for the review!
|
# Save tracked stats for token counters. | ||
self.num_prompt_tokens += iteration_stats.num_prompt_tokens | ||
self.num_generation_tokens += iteration_stats.num_generation_tokens | ||
self.num_preempted_reqs += iteration_stats.num_preempted_reqs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we seem to already have counter_num_preempted_reqs? can we use that? cc: @markmc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yeqcharlotte Good point but currently the counter_num_preempted_reqs is in the PrometheusStatLogger and our predictor use our own loggers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this change is relatively small. it's ok to let it through. we can follow up to see if we can reuse more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should add data into LoggingStatLogger
if it is not used by LoggingStatLogger
itself - there's no reason to incur this overhead in the upstream logger
Something like this would be equivalent
class PreemptionTrackingLogger(LoggingStatLogger):
def __init__(self, vllm_config: VllmConfig, engine_index: int = 0):
super().__init__(vllm_config, engine_index)
self.total_preempted_reqs = 0
def record(self,
scheduler_stats: Optional[SchedulerStats],
iteration_stats: Optional[IterationStats],
engine_idx: int = 0):
# Call parent record logic
super().record(scheduler_stats, iteration_stats, engine_idx)
# Track preempted requests
if iteration_stats is not None:
self.total_preempted_reqs += iteration_stats.num_preempted_reqs
def log(self):
# Run base logging first
super().log()
# Add preempted requests info
logger.info(
"Engine %03d: Total preempted requests so far: %d",
self.engine_index,
self.total_preempted_reqs,
)
Summary
The purpose of this PR is to store the num preemptions per step in the logger class locally thus it can be leveraged in children classes.
Currently when no new blocks available from each step, we already record this as request events and set it back to engine client by EngineCoreResponses which later got aggregated in the iteration stats.
Test Plan
Differential Revision: D82650207