[Misc][Metrics] expose requests preemptions in logger #25303

kingsmad · 2025-09-20T05:22:36Z

Summary

The purpose of this PR is to store the num preemptions per step in the logger class locally thus it can be leveraged in children classes.

Currently when no new blocks available from each step, we already record this as request events and set it back to engine client by EngineCoreResponses which later got aggregated in the iteration stats.

Test Plan

Added a simple unit tests for iteration stats.
Run locally, saturate cache usage to 100%, we are able to see "llm.vllm.request.preemptions" popped up

Differential Revision: D82650207

Summary: Currently when no new blocks available from each step, we already record this as [request events](https://fburl.com/code/rsiolx07) and set it back to engine client by EngineCoreResponses which later got [aggregated](https://fburl.com/code/82r3x1lw) in the [iteration stats](https://fburl.com/code/lw96wgom). In this diff, we just expose this to ODS via MetaStatLoggerV1 thus we get the counter exposed in the background. The reason we want this counter is to measure num requests preemptions when kv cache is saturated. Test Plan: run locally, saturate cache usage to 100%, we are able to see "llm.vllm.request.preemptions" popped up {F1982066617} Differential Revision: D82650207

facebook-github-bot · 2025-09-20T05:22:53Z

@kingsmad has exported this pull request. If you are a Meta employee, you can view the originating diff in D82650207.

gemini-code-assist

Code Review

This pull request aims to expose request preemption counts by tracking them in LoggingStatLogger. While the counter num_preempted_reqs is correctly added and updated, a critical issue exists where this value is reset within the log method before it is ever used. This bug prevents the preemption count from being exposed, defeating the purpose of the change. My review includes a comment detailing this issue.

vllm/v1/metrics/loggers.py

houseroad

We probably need a better name for this PR, and add some unittest.

yeqcharlotte

please fix links in description to be public. also let's see tests to be added in https://github.com/vllm-project/vllm/blob/9607d5eb449711b349d4c2bee0a9c94afcc7ed14/tests/v1/metrics/test_engine_logger_apis.py

vllm-project#25337) Signed-off-by: Roger Wang <[email protected]>

…roject#25339) Signed-off-by: simondanielsson <[email protected]>

…loaded (vllm-project#25341) Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]>

…tor (vllm-project#25334) Signed-off-by: Woosuk Kwon <[email protected]>

…ject#25250) Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: Claude <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]>

…vllm-project#25347) Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]>

vllm-project#25325) Signed-off-by: Yang <[email protected]>

Signed-off-by: Debolina Roy <[email protected]>

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]>

Signed-off-by: David Chen <[email protected]>

…vllm-project#22002) Signed-off-by: wangzi <[email protected]> Signed-off-by: David Chen <[email protected]> Co-authored-by: wangzi <[email protected]> Co-authored-by: Chauncey <[email protected]>

Signed-off-by: Juechen Liu <[email protected]>

kingsmad · 2025-09-22T06:50:18Z

Addressed comments, thanks for the review!

Changed the PR title.
Added a simple unit test for recording iteration stats.
Updated the summary to use public code links.

yeqcharlotte · 2025-09-27T00:20:48Z

vllm/v1/metrics/loggers.py

        # Save tracked stats for token counters.
        self.num_prompt_tokens += iteration_stats.num_prompt_tokens
        self.num_generation_tokens += iteration_stats.num_generation_tokens
+        self.num_preempted_reqs += iteration_stats.num_preempted_reqs


we seem to already have counter_num_preempted_reqs? can we use that? cc: @markmc

@yeqcharlotte Good point but currently the counter_num_preempted_reqs is in the PrometheusStatLogger and our predictor use our own loggers.

yeqcharlotte

since this change is relatively small. it's ok to let it through. we can follow up to see if we can reuse more.

markmc

I don't think we should add data into LoggingStatLogger if it is not used by LoggingStatLogger itself - there's no reason to incur this overhead in the upstream logger

Something like this would be equivalent

class PreemptionTrackingLogger(LoggingStatLogger):
    def __init__(self, vllm_config: VllmConfig, engine_index: int = 0):
        super().__init__(vllm_config, engine_index)
        self.total_preempted_reqs = 0

    def record(self,
               scheduler_stats: Optional[SchedulerStats],
               iteration_stats: Optional[IterationStats],
               engine_idx: int = 0):
        # Call parent record logic
        super().record(scheduler_stats, iteration_stats, engine_idx)

        # Track preempted requests
        if iteration_stats is not None:
            self.total_preempted_reqs += iteration_stats.num_preempted_reqs

    def log(self):
        # Run base logging first
        super().log()

        # Add preempted requests info
        logger.info(
            "Engine %03d: Total preempted requests so far: %d",
            self.engine_index,
            self.total_preempted_reqs,
        )

kingsmad requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 20, 2025 05:22

mergify bot added the v1 label Sep 20, 2025

gemini-code-assist bot reviewed Sep 20, 2025

View reviewed changes

vllm/v1/metrics/loggers.py Show resolved Hide resolved

houseroad reviewed Sep 20, 2025

View reviewed changes

yeqcharlotte changed the title ~~expose requests preemptions to ods~~ [Misc][Metrics] expose requests preemptions in logger Sep 20, 2025

yeqcharlotte requested changes Sep 20, 2025

View reviewed changes

kingsmad and others added 2 commits September 20, 2025 15:31

Merge branch 'vllm-project:main' into export-D82650207

3bdd7f4

Merge remote-tracking branch 'upstream/main' into export-D82650207

5ab6dc4

kingsmad marked this pull request as draft September 22, 2025 05:33

ywang96 and others added 14 commits September 21, 2025 22:58

[MM][Perf] Minor Optimization on Qwen3-VL fast_pos_embed_interpolate (

c3f7ed3

vllm-project#25337) Signed-off-by: Roger Wang <[email protected]>

[Bugfix] Typos in error message for missing model config file (vllm-p…

a2c21a2

…roject#25339) Signed-off-by: simondanielsson <[email protected]>

[Optimization] Cache chat template result when processor fails to be …

6d949db

…loaded (vllm-project#25341) Signed-off-by: DarkLight1337 <[email protected]>

[V0 Deprecation] Remove V0 Sequence class & Sampler (vllm-project#25332)

b2e5dc1

Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]>

[V0 Deprecation] Remove async_output_proc, preemption mode, delay fac…

66b1e08

…tor (vllm-project#25334) Signed-off-by: Woosuk Kwon <[email protected]>

feat: Enable engine-level arguments with speculators models (vllm-pro…

0de3fac

…ject#25250) Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: Claude <[email protected]>

[V0 Deprecation] Remove V0 sampling metadata (vllm-project#25345)

69a7601

Signed-off-by: Woosuk Kwon <[email protected]>

[Perf] Further optimization for Qwen3-VL fast_pos_embed_interpolate (…

9f092a0

…vllm-project#25347) Signed-off-by: Isotr0py <[email protected]>

Remove V0 attention backends (vllm-project#25351)

6217239

Signed-off-by: Woosuk Kwon <[email protected]>

[Bugfix][V0 Deprecation][CI] use async mock and await for async method (

a271abf

vllm-project#25325) Signed-off-by: Yang <[email protected]>

Multimodal - audio tests (vllm-project#25285)

73f2bef

Signed-off-by: Debolina Roy <[email protected]>

[Model] Support Dots OCR (vllm-project#24645)

1ffb412

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: yinz-aizip <[email protected]>

[Docs] GSM8K Accuracy Evaluation doc update (vllm-project#25360)

b608cb4

Signed-off-by: David Chen <[email protected]>

[Bugfix] Fix hermes tool parser handling of non-string argument types (…

b012cf6

…vllm-project#22002) Signed-off-by: wangzi <[email protected]> Signed-off-by: David Chen <[email protected]> Co-authored-by: wangzi <[email protected]> Co-authored-by: Chauncey <[email protected]>

mergify bot added deepseek Related to DeepSeek models frontend llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models qwen Related to Qwen models gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm labels Sep 22, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Sep 22, 2025

mergify bot added the speculative-decoding label Sep 22, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Sep 22, 2025

mergify bot added tpu Related to Google TPUs tool-calling kv-connector labels Sep 22, 2025

github-project-automation bot added this to Tool Calling Sep 22, 2025

Merge branch 'vllm-project:main' into export-D82650207

afc679e

mergify bot removed the tpu Related to Google TPUs label Sep 22, 2025

format code

170ba7d

Signed-off-by: Juechen Liu <[email protected]>

kingsmad marked this pull request as ready for review September 22, 2025 06:50

kingsmad requested a review from yeqcharlotte September 22, 2025 06:51

yeqcharlotte reviewed Sep 27, 2025

View reviewed changes

yeqcharlotte approved these changes Sep 28, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 28, 2025

yeqcharlotte enabled auto-merge (squash) September 28, 2025 01:59

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 28, 2025

kingsmad and others added 2 commits September 28, 2025 21:13

Merge remote-tracking branch 'upstream/main' into export-D82650207

a436358

Merge branch 'vllm-project:main' into export-D82650207

8e75688

markmc suggested changes Sep 29, 2025

View reviewed changes

github-project-automation bot moved this from Ready to In progress in gpt-oss Issues & Enhancements Sep 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc][Metrics] expose requests preemptions in logger #25303

[Misc][Metrics] expose requests preemptions in logger #25303

Uh oh!

kingsmad commented Sep 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

facebook-github-bot commented Sep 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

houseroad left a comment

Uh oh!

yeqcharlotte left a comment

Uh oh!

kingsmad commented Sep 22, 2025

Uh oh!

yeqcharlotte Sep 27, 2025

Uh oh!

kingsmad Sep 27, 2025

Uh oh!

yeqcharlotte left a comment

Uh oh!

markmc left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Uh oh!

[Misc][Metrics] expose requests preemptions in logger #25303

Are you sure you want to change the base?

[Misc][Metrics] expose requests preemptions in logger #25303

Uh oh!

Conversation

kingsmad commented Sep 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

facebook-github-bot commented Sep 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

kingsmad commented Sep 22, 2025

Uh oh!

yeqcharlotte Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

kingsmad Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

markmc left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

kingsmad commented Sep 20, 2025 •

edited by github-actions bot

Loading