Skip to content

Conversation

RichardoMrMu
Copy link
Contributor

@RichardoMrMu RichardoMrMu commented Jul 2, 2025

For my fault to sync fork, I accidentally closed the previous PR 18069. I recreate a PR.
origin:
following this issue #17794, and #10582 this issue ([P2] Make it work with tracing too (there's some request level stats that tracing needs, like queue time, ttft). These request level metric should be possible to be surfaced in v1 too.). I make engine v1 to generate model tracing span.

Mu Huai and others added 2 commits July 2, 2025 20:19
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @RichardoMrMu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive OpenTelemetry tracing capabilities to the vLLM V1 engine. It enables the capture and propagation of tracing context throughout the request lifecycle and generates detailed spans with granular performance metrics, providing deeper insights into request processing times, including queueing, prefill, decode, and overall inference.

Highlights

  • V1 Engine Tracing Enablement: The vLLM V1 engine now supports OpenTelemetry tracing, allowing for detailed observability of request lifecycles. This includes removing a previous restriction that explicitly prevented tracing in V1.
  • Trace Context Propagation: Trace headers are now propagated throughout the request handling pipeline, from the initial Request object through EngineCoreRequest and EngineCoreOutput, ensuring continuity of tracing context across different stages of processing.
  • Detailed Latency Metrics: New SpanAttributes have been introduced to capture granular latency metrics, specifically for prefill, decode, and overall inference times within the model execution, providing more precise performance insights.
  • OpenTelemetry Span Generation: The OutputProcessor now includes logic to initialize an OpenTelemetry tracer and, upon request completion, generates detailed llm_request spans. These spans include comprehensive performance metrics such as queue time, time-to-first-token (TTFT), end-to-end latency, and the newly added prefill, decode, and inference times, along with request and usage details.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the v1 label Jul 2, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds OpenTelemetry tracing support to the v1 engine, which is a great feature for observability. The changes correctly plumb the trace_headers through the request lifecycle and introduce a do_tracing method to generate spans with relevant latency and usage metrics.

My review has identified two main issues:

  1. A critical issue where the new tracing functionality is effectively disabled because the OutputProcessor is not instantiated with the necessary observability_config.
  2. A high-severity issue in the do_tracing method where a latency metric is calculated incorrectly and set twice.

Addressing these points will ensure the new tracing feature works correctly and provides accurate metrics. Please see the detailed comments for suggestions.

Comment on lines 281 to 284
def __init__(self,
tokenizer: TokenizerGroup,
log_stats: bool,
observability_config: Optional[ObservabilityConfig] = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The new observability_config parameter is not passed during OutputProcessor instantiation in its call sites, such as in vllm.v1.engine.async_llm.AsyncLLM and vllm.v1.engine.llm_engine.LLMEngine. This will result in self.tracer always being None, effectively disabling the new tracing functionality.

Please ensure that the vllm_config.observability_config is passed to the OutputProcessor constructor in all instantiation locations.

For example, in vllm/v1/engine/async_llm.py:

# In AsyncLLM.__init__
self.output_processor = OutputProcessor(
    self.tokenizer,
    log_stats=self.log_stats,
    observability_config=vllm_config.observability_config
)

This is a critical issue for the feature to work as intended.

Comment on lines 499 to 500
span.set_attribute(SpanAttributes.GEN_AI_LATENCY_TIME_IN_QUEUE,
metrics.queued_ts - metrics.arrival_time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The attribute SpanAttributes.GEN_AI_LATENCY_TIME_IN_QUEUE is set twice with different values.

  1. On line 500, it's set to metrics.queued_ts - metrics.arrival_time. This seems to represent the time from request arrival to being queued, not the time spent in the queue.
  2. On line 505, it's set to queued_time, which is calculated as metrics.scheduled_ts - metrics.queued_ts. This correctly represents the time spent in the queue, as per the comment on line 476.

The second assignment overwrites the first one. The first calculation appears to be incorrect for this specific attribute. I recommend removing the first assignment.

If metrics.queued_ts - metrics.arrival_time is a metric you want to track, it should be assigned to a different span attribute.

Copy link

github-actions bot commented Jul 2, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@frzifus
Copy link

frzifus commented Jul 2, 2025

cc @sallyom

@7d1-z
Copy link

7d1-z commented Jul 31, 2025

Hi @RichardoMrMu. This PR is very helpful. Do you plan to continue developing?

@wyhhyw123
Copy link

i hope to use it, please go on!!!!

@Nagaraj4775
Copy link

Currently do we have tracing for v1? i am getting error while enabling tracing with v1.
When can we expect this feature being merged with master

hcyezhang and others added 2 commits August 4, 2025 20:07
fix: merge error by accident
Copy link

mergify bot commented Aug 4, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @RichardoMrMu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 4, 2025
@mergify mergify bot removed the needs-rebase label Aug 5, 2025
@bbartels
Copy link
Contributor

bbartels commented Aug 6, 2025

@RichardoMrMu Added some suggestions that should fix the most prominent build errors!

Co-authored-by: Benjamin Bartels <[email protected]>
Signed-off-by: RichardoMu <[email protected]>

# meta
span.set_attribute(SpanAttributes.GEN_AI_REQUEST_ID, req_state.request_id)
if req_state.parent_req and req_state.parent_req.sampling_params:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pooling model or params.n is 1, then the following attributes are missing ?

if is_pooling or params.n == 1:

Is it possible to add these attributes regardless of model type and params.n ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. I added fields in RequestState to hold these sampling_params from EngineCoreRequest. or plz let me know if there's better way to do it :)

@mergify mergify bot added the ci/build label Sep 10, 2025
@ChrisYangAI
Copy link
Contributor

ChrisYangAI commented Sep 10, 2025

@simon-mo hi, the number of unit test checks recently increased to over 70, and the runs keep failing for reasons unrelated to this PR. Could you please help check this?

@simon-mo simon-mo merged commit 40b6c91 into vllm-project:main Sep 12, 2025
66 of 71 checks passed
@simon-mo
Copy link
Collaborator

metrics tracing tests passed, other tests seems fixed on main

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
Signed-off-by: Mu Huai <[email protected]>
Signed-off-by: Ye Zhang <[email protected]>
Signed-off-by: RichardoMu <[email protected]>
Signed-off-by: simon-mo <[email protected]>
Signed-off-by: Aaron Pham <[email protected]>
Signed-off-by: 22quinn <[email protected]>
Co-authored-by: Mu Huai <[email protected]>
Co-authored-by: Ye Zhang <[email protected]>
Co-authored-by: Benjamin Bartels <[email protected]>
Co-authored-by: simon-mo <[email protected]>
Co-authored-by: 瑜琮 <[email protected]>
Co-authored-by: Aaron Pham <[email protected]>
Co-authored-by: 22quinn <[email protected]>
dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025
Signed-off-by: Mu Huai <[email protected]>
Signed-off-by: Ye Zhang <[email protected]>
Signed-off-by: RichardoMu <[email protected]>
Signed-off-by: simon-mo <[email protected]>
Signed-off-by: Aaron Pham <[email protected]>
Signed-off-by: 22quinn <[email protected]>
Co-authored-by: Mu Huai <[email protected]>
Co-authored-by: Ye Zhang <[email protected]>
Co-authored-by: Benjamin Bartels <[email protected]>
Co-authored-by: simon-mo <[email protected]>
Co-authored-by: 瑜琮 <[email protected]>
Co-authored-by: Aaron Pham <[email protected]>
Co-authored-by: 22quinn <[email protected]>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Mu Huai <[email protected]>
Signed-off-by: Ye Zhang <[email protected]>
Signed-off-by: RichardoMu <[email protected]>
Signed-off-by: simon-mo <[email protected]>
Signed-off-by: Aaron Pham <[email protected]>
Signed-off-by: 22quinn <[email protected]>
Co-authored-by: Mu Huai <[email protected]>
Co-authored-by: Ye Zhang <[email protected]>
Co-authored-by: Benjamin Bartels <[email protected]>
Co-authored-by: simon-mo <[email protected]>
Co-authored-by: 瑜琮 <[email protected]>
Co-authored-by: Aaron Pham <[email protected]>
Co-authored-by: 22quinn <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Mu Huai <[email protected]>
Signed-off-by: Ye Zhang <[email protected]>
Signed-off-by: RichardoMu <[email protected]>
Signed-off-by: simon-mo <[email protected]>
Signed-off-by: Aaron Pham <[email protected]>
Signed-off-by: 22quinn <[email protected]>
Co-authored-by: Mu Huai <[email protected]>
Co-authored-by: Ye Zhang <[email protected]>
Co-authored-by: Benjamin Bartels <[email protected]>
Co-authored-by: simon-mo <[email protected]>
Co-authored-by: 瑜琮 <[email protected]>
Co-authored-by: Aaron Pham <[email protected]>
Co-authored-by: 22quinn <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.