[V1] feat:add engine v1 tracing #20372

RichardoMrMu · 2025-07-02T12:25:53Z

For my fault to sync fork, I accidentally closed the previous PR 18069. I recreate a PR.
origin:
following this issue #17794, and #10582 this issue ([P2] Make it work with tracing too (there's some request level stats that tracing needs, like queue time, ttft). These request level metric should be possible to be surfaced in v1 too.). I make engine v1 to generate model tracing span.

Signed-off-by: Mu Huai <[email protected]>

feat:trace v1

gemini-code-assist

Summary of Changes

Hello @RichardoMrMu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive OpenTelemetry tracing capabilities to the vLLM V1 engine. It enables the capture and propagation of tracing context throughout the request lifecycle and generates detailed spans with granular performance metrics, providing deeper insights into request processing times, including queueing, prefill, decode, and overall inference.

Highlights

V1 Engine Tracing Enablement: The vLLM V1 engine now supports OpenTelemetry tracing, allowing for detailed observability of request lifecycles. This includes removing a previous restriction that explicitly prevented tracing in V1.
Trace Context Propagation: Trace headers are now propagated throughout the request handling pipeline, from the initial Request object through EngineCoreRequest and EngineCoreOutput, ensuring continuity of tracing context across different stages of processing.
Detailed Latency Metrics: New SpanAttributes have been introduced to capture granular latency metrics, specifically for prefill, decode, and overall inference times within the model execution, providing more precise performance insights.
OpenTelemetry Span Generation: The OutputProcessor now includes logic to initialize an OpenTelemetry tracer and, upon request completion, generates detailed llm_request spans. These spans include comprehensive performance metrics such as queue time, time-to-first-token (TTFT), end-to-end latency, and the newly added prefill, decode, and inference times, along with request and usage details.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds OpenTelemetry tracing support to the v1 engine, which is a great feature for observability. The changes correctly plumb the trace_headers through the request lifecycle and introduce a do_tracing method to generate spans with relevant latency and usage metrics.

My review has identified two main issues:

A critical issue where the new tracing functionality is effectively disabled because the OutputProcessor is not instantiated with the necessary observability_config.
A high-severity issue in the do_tracing method where a latency metric is calculated incorrectly and set twice.

Addressing these points will ensure the new tracing feature works correctly and provides accurate metrics. Please see the detailed comments for suggestions.

gemini-code-assist · 2025-07-02T12:27:46Z

vllm/v1/engine/output_processor.py

+    def __init__(self,
+                 tokenizer: TokenizerGroup,
+                 log_stats: bool,
+                 observability_config: Optional[ObservabilityConfig] = None):


The new observability_config parameter is not passed during OutputProcessor instantiation in its call sites, such as in vllm.v1.engine.async_llm.AsyncLLM and vllm.v1.engine.llm_engine.LLMEngine. This will result in self.tracer always being None, effectively disabling the new tracing functionality.

Please ensure that the vllm_config.observability_config is passed to the OutputProcessor constructor in all instantiation locations.

For example, in vllm/v1/engine/async_llm.py:

# In AsyncLLM.__init__ self.output_processor = OutputProcessor( self.tokenizer, log_stats=self.log_stats, observability_config=vllm_config.observability_config )

This is a critical issue for the feature to work as intended.

gemini-code-assist · 2025-07-02T12:27:46Z

vllm/v1/engine/output_processor.py

+            span.set_attribute(SpanAttributes.GEN_AI_LATENCY_TIME_IN_QUEUE,
+                               metrics.queued_ts - metrics.arrival_time)


The attribute SpanAttributes.GEN_AI_LATENCY_TIME_IN_QUEUE is set twice with different values.

On line 500, it's set to metrics.queued_ts - metrics.arrival_time. This seems to represent the time from request arrival to being queued, not the time spent in the queue.

On line 505, it's set to queued_time, which is calculated as metrics.scheduled_ts - metrics.queued_ts. This correctly represents the time spent in the queue, as per the comment on line 476.

The second assignment overwrites the first one. The first calculation appears to be incorrect for this specific attribute. I recommend removing the first assignment.

If metrics.queued_ts - metrics.arrival_time is a metric you want to track, it should be assigned to a different span attribute.

github-actions · 2025-07-02T12:41:53Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Ye Zhang <[email protected]>

[vllm] v1 tracing with fixes based on vllm-project#20372

frzifus · 2025-07-02T20:14:47Z

cc @sallyom

7d1-z · 2025-07-31T01:54:19Z

Hi @RichardoMrMu. This PR is very helpful. Do you plan to continue developing?

wyhhyw123 · 2025-08-01T07:32:36Z

i hope to use it, please go on!!!!

Nagaraj4775 · 2025-08-01T12:35:02Z

Currently do we have tracing for v1? i am getting error while enabling tracing with v1.
When can we expect this feature being merged with master

Signed-off-by: Ye Zhang <[email protected]>

fix: merge error by accident

mergify · 2025-08-04T12:18:40Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @RichardoMrMu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllm/v1/engine/async_llm.py

vllm/v1/engine/output_processor.py

bbartels · 2025-08-06T21:37:27Z

@RichardoMrMu Added some suggestions that should fix the most prominent build errors!

Co-authored-by: Benjamin Bartels <[email protected]> Signed-off-by: RichardoMu <[email protected]>

yanglong1010 · 2025-08-07T06:12:45Z

vllm/v1/engine/output_processor.py

+
+            # meta
+            span.set_attribute(SpanAttributes.GEN_AI_REQUEST_ID, req_state.request_id)
+            if req_state.parent_req and req_state.parent_req.sampling_params:


If pooling model or params.n is 1, then the following attributes are missing ?

vllm/vllm/v1/engine/async_llm.py

Line 255 in 82216dc

if is_pooling or params.n == 1:

Is it possible to add these attributes regardless of model type and params.n ?

Yeah, good point. I added fields in RequestState to hold these sampling_params from EngineCoreRequest. or plz let me know if there's better way to do it :)

Chris traceut fix

…llm-project#24380) Signed-off-by: Aaron Pham <[email protected]>

…4404) Signed-off-by: 22quinn <[email protected]>

Fix guidedcodeing ut failure

fix trace test pipeline config

ChrisYangAI · 2025-09-10T14:08:03Z

@simon-mo hi, the number of unit test checks recently increased to over 70, and the runs keep failing for reasons unrelated to this PR. Could you please help check this?

simon-mo · 2025-09-12T00:10:49Z

metrics tracing tests passed, other tests seems fixed on main

Signed-off-by: Mu Huai <[email protected]> Signed-off-by: Ye Zhang <[email protected]> Signed-off-by: RichardoMu <[email protected]> Signed-off-by: simon-mo <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Signed-off-by: 22quinn <[email protected]> Co-authored-by: Mu Huai <[email protected]> Co-authored-by: Ye Zhang <[email protected]> Co-authored-by: Benjamin Bartels <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: 瑜琮 <[email protected]> Co-authored-by: Aaron Pham <[email protected]> Co-authored-by: 22quinn <[email protected]>

Signed-off-by: Mu Huai <[email protected]> Signed-off-by: Ye Zhang <[email protected]> Signed-off-by: RichardoMu <[email protected]> Signed-off-by: simon-mo <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Signed-off-by: 22quinn <[email protected]> Co-authored-by: Mu Huai <[email protected]> Co-authored-by: Ye Zhang <[email protected]> Co-authored-by: Benjamin Bartels <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: 瑜琮 <[email protected]> Co-authored-by: Aaron Pham <[email protected]> Co-authored-by: 22quinn <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Mu Huai and others added 2 commits July 2, 2025 20:19

feat:trace v1

e0bb716

Signed-off-by: Mu Huai <[email protected]>

Merge pull request #1 from RichardoMrMu/feat-trace-v1-aftermerge

a7414f7

feat:trace v1

RichardoMrMu requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners July 2, 2025 12:25

gemini-code-assist bot reviewed Jul 2, 2025

View reviewed changes

mergify bot added the v1 label Jul 2, 2025

gemini-code-assist bot reviewed Jul 2, 2025

View reviewed changes

hcyezhang added 2 commits July 2, 2025 20:55

fix: ttft calculation

440ca59

Signed-off-by: Ye Zhang <[email protected]>

Merge pull request #2 from RichardoMrMu/main-ttft-fix

a30adc7

[vllm] v1 tracing with fixes based on vllm-project#20372

frzifus mentioned this pull request Jul 2, 2025

distributed tracing proposal llm-d/llm-d#119

Open

hcyezhang and others added 2 commits August 4, 2025 20:07

fix: merge error by accident

8afb03e

Signed-off-by: Ye Zhang <[email protected]>

Merge pull request #3 from hcyezhang/main

e0af39b

fix: merge error by accident

mergify bot added the needs-rebase label Aug 4, 2025

Merge branch 'main' into fix_conflict

a5462a1

mergify bot removed the needs-rebase label Aug 5, 2025

bbartels reviewed Aug 6, 2025

View reviewed changes

vllm/v1/engine/async_llm.py Show resolved Hide resolved

bbartels reviewed Aug 6, 2025

View reviewed changes

vllm/v1/engine/output_processor.py Show resolved Hide resolved

Update vllm/v1/engine/async_llm.py

b5c27ed

Co-authored-by: Benjamin Bartels <[email protected]> Signed-off-by: RichardoMu <[email protected]>

yanglong1010 reviewed Aug 7, 2025

View reviewed changes

ChrisYangAI and others added 6 commits September 9, 2025 18:46

fix precommit error

8cf7c88

fix precommit error

a2b5346

Merge pull request #8 from RichardoMrMu/chris_traceut_fix

57c0df6

Chris traceut fix

[CI][Fix] deterministic seed for flaky CI runs on structured outputs (v…

27d6c69

…llm-project#24380) Signed-off-by: Aaron Pham <[email protected]>

[CI/Build] Disable flaky test_structured_output tests (vllm-project#2…

1d100d0

…4404) Signed-off-by: 22quinn <[email protected]>

Merge pull request #9 from RichardoMrMu/fix_guidedcodeing_ut_failure

a5c7f83

Fix guidedcodeing ut failure

RichardoMrMu requested review from aarnphm, mgoin and russellb as code owners September 10, 2025 01:52

Merge branch 'main' into fix_conflict

6370955

ChrisYangAI requested a review from heheda12345 as a code owner September 10, 2025 03:39

ChrisYangAI and others added 2 commits September 10, 2025 14:21

fix trace test pipeline config

bce28cc

Merge pull request #10 from RichardoMrMu/fix_trace_test_config

204a6b7

fix trace test pipeline config

mergify bot added the ci/build label Sep 10, 2025

ChrisYangAI added 2 commits September 10, 2025 14:51

Merge branch 'main' into fix_conflict

e03076c

Merge branch 'main' into fix_conflict

23e74d3

simon-mo merged commit 40b6c91 into vllm-project:main Sep 12, 2025
66 of 71 checks passed

markmc mentioned this pull request Sep 15, 2025

[Feature]: Add OpenTelemetry (trace) API to v1 #17794

Closed

1 task

PROFeNoM mentioned this pull request Oct 2, 2025

feat(vllm): add vLLM integration DataDog/dd-trace-py#14732

Open

markmc mentioned this pull request Oct 8, 2025

docs: clarify remaining v0 references #26311

Open

		span.set_attribute(SpanAttributes.GEN_AI_LATENCY_TIME_IN_QUEUE,
		metrics.queued_ts - metrics.arrival_time)

Uh oh!

[V1] feat:add engine v1 tracing #20372

[V1] feat:add engine v1 tracing #20372

Uh oh!

Conversation

RichardoMrMu commented Jul 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

frzifus commented Jul 2, 2025

Uh oh!

7d1-z commented Jul 31, 2025

Uh oh!

wyhhyw123 commented Aug 1, 2025

Uh oh!

Nagaraj4775 commented Aug 1, 2025

Uh oh!

mergify bot commented Aug 4, 2025

Uh oh!

Uh oh!

Uh oh!

bbartels commented Aug 6, 2025

Uh oh!

yanglong1010 Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

hcyezhang Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

ChrisYangAI commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

simon-mo commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

RichardoMrMu commented Jul 2, 2025 •

edited by github-actions bot

Loading

ChrisYangAI commented Sep 10, 2025 •

edited

Loading