Skip to content

Conversation

kingsmad
Copy link
Contributor

@kingsmad kingsmad commented Sep 20, 2025

Summary

The purpose of this PR is to store the num preemptions per step in the logger class locally thus it can be leveraged in children classes.

Currently when no new blocks available from each step, we already record this as request events and set it back to engine client by EngineCoreResponses which later got aggregated in the iteration stats.

Test Plan

  1. Added a simple unit tests for iteration stats.
  2. Run locally, saturate cache usage to 100%, we are able to see "llm.vllm.request.preemptions" popped up image

Differential Revision: D82650207

Summary:
Currently when no new blocks available from each step, we already record this as [request events](https://fburl.com/code/rsiolx07) and set it back to engine client by EngineCoreResponses which later got [aggregated](https://fburl.com/code/82r3x1lw) in the [iteration stats](https://fburl.com/code/lw96wgom). 

In this diff, we just expose this to ODS via MetaStatLoggerV1 thus we get the counter exposed in the background. The reason we want this counter is to measure num requests preemptions when kv cache is saturated.

Test Plan: run locally, saturate cache usage to 100%,  we are able to see "llm.vllm.request.preemptions" popped up {F1982066617}

Differential Revision: D82650207
@facebook-github-bot
Copy link

@kingsmad has exported this pull request. If you are a Meta employee, you can view the originating diff in D82650207.

@mergify mergify bot added the v1 label Sep 20, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to expose request preemption counts by tracking them in LoggingStatLogger. While the counter num_preempted_reqs is correctly added and updated, a critical issue exists where this value is reset within the log method before it is ever used. This bug prevents the preemption count from being exposed, defeating the purpose of the change. My review includes a comment detailing this issue.

Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need a better name for this PR, and add some unittest.

@yeqcharlotte yeqcharlotte changed the title expose requests preemptions to ods [Misc][Metrics] expose requests preemptions in logger Sep 20, 2025
Copy link
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kingsmad kingsmad marked this pull request as draft September 22, 2025 05:33
ywang96 and others added 14 commits September 21, 2025 22:58
Signed-off-by: Roger Wang <[email protected]>
Co-authored-by: yinz-aizip <[email protected]>
@mergify mergify bot added deepseek Related to DeepSeek models frontend llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models qwen Related to Qwen models gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm labels Sep 22, 2025
@mergify mergify bot added tpu Related to Google TPUs tool-calling kv-connector labels Sep 22, 2025
@mergify mergify bot removed the tpu Related to Google TPUs label Sep 22, 2025
Signed-off-by: Juechen Liu <[email protected]>
@kingsmad
Copy link
Contributor Author

Addressed comments, thanks for the review!

  1. Changed the PR title.
  2. Added a simple unit test for recording iteration stats.
  3. Updated the summary to use public code links.

@kingsmad kingsmad marked this pull request as ready for review September 22, 2025 06:50
# Save tracked stats for token counters.
self.num_prompt_tokens += iteration_stats.num_prompt_tokens
self.num_generation_tokens += iteration_stats.num_generation_tokens
self.num_preempted_reqs += iteration_stats.num_preempted_reqs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we seem to already have counter_num_preempted_reqs? can we use that? cc: @markmc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yeqcharlotte Good point but currently the counter_num_preempted_reqs is in the PrometheusStatLogger and our predictor use our own loggers.

Copy link
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this change is relatively small. it's ok to let it through. we can follow up to see if we can reuse more.

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 28, 2025
@yeqcharlotte yeqcharlotte enabled auto-merge (squash) September 28, 2025 01:59
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 28, 2025
Copy link
Member

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add data into LoggingStatLogger if it is not used by LoggingStatLogger itself - there's no reason to incur this overhead in the upstream logger

Something like this would be equivalent

class PreemptionTrackingLogger(LoggingStatLogger):
    def __init__(self, vllm_config: VllmConfig, engine_index: int = 0):
        super().__init__(vllm_config, engine_index)
        self.total_preempted_reqs = 0

    def record(self,
               scheduler_stats: Optional[SchedulerStats],
               iteration_stats: Optional[IterationStats],
               engine_idx: int = 0):
        # Call parent record logic
        super().record(scheduler_stats, iteration_stats, engine_idx)

        # Track preempted requests
        if iteration_stats is not None:
            self.total_preempted_reqs += iteration_stats.num_preempted_reqs

    def log(self):
        # Run base logging first
        super().log()

        # Add preempted requests info
        logger.info(
            "Engine %03d: Total preempted requests so far: %d",
            self.engine_index,
            self.total_preempted_reqs,
        )

@github-project-automation github-project-automation bot moved this from Ready to In progress in gpt-oss Issues & Enhancements Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models kv-connector llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding tool-calling v1

Projects

Status: No status
Status: In progress

Development

Successfully merging this pull request may close these issues.