-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[V1] feat:add engine v1 tracing #20372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
52 commits
Select commit
Hold shift + click to select a range
e0bb716
feat:trace v1
a7414f7
Merge pull request #1 from RichardoMrMu/feat-trace-v1-aftermerge
RichardoMrMu 440ca59
fix: ttft calculation
hcyezhang a30adc7
Merge pull request #2 from RichardoMrMu/main-ttft-fix
hcyezhang 8afb03e
fix: merge error by accident
hcyezhang e0af39b
Merge pull request #3 from hcyezhang/main
RichardoMrMu a5462a1
Merge branch 'main' into fix_conflict
RichardoMrMu b5c27ed
Update vllm/v1/engine/async_llm.py
RichardoMrMu 7b1de1c
Update vllm/v1/engine/output_processor.py
RichardoMrMu cdf0d9f
fix: gen meta directly from enginecorequest.sampling_params
hcyezhang 4661667
Merge pull request #4 from hcyezhang/main
hcyezhang 8e3887c
Update vllm/v1/engine/processor.py
RichardoMrMu 3d65643
Merge branch 'main' into fix_conflict
RichardoMrMu 6bea3fa
fix:pre-commit
1a5af39
Merge pull request #5 from RichardoMrMu/fix_conflict_2
RichardoMrMu 4e623e3
Merge branch 'main' into fix_conflict
RichardoMrMu dd8c2a0
fix:pre-commit
47bea22
Merge branch 'main' into fix_conflict
RichardoMrMu 33d736e
Merge branch 'main' into fix_conflict
RichardoMrMu 6699296
remove v0 guard for tests
simon-mo c182529
Merge branch 'main' into fix_conflict
simon-mo 86e4321
Merge branch 'main' into fix_conflict
RichardoMrMu 516b954
change: test_tracing.py gpu_memory_utilization=0.3 to avoid oom
71012c0
test: timeout to 10
baa6b85
change: set env VLLM_USE_V1 1
05b2e69
test: set env VLLM_USE_V1 0
5f51aa1
Merge branch 'main' into fix_conflict
RichardoMrMu e1113e9
test: set env VLLM_USE_V1 1
81decbd
fix: tracing ut - tracer not initialized
hcyezhang b0f85e6
Merge branch 'fix_conflict' into main
hcyezhang 38434cd
Merge pull request #6 from hcyezhang/main
RichardoMrMu eedf207
test:
28c0de7
Merge remote-tracking branch 'origin/fix_conflict' into fix_conflict
c255374
Merge branch 'main' into fix_conflict
RichardoMrMu 73daf4d
test:disable_log_stats=False
0623cd7
test:format
57dbf9f
test:no model name
cdb9c48
add tracing ut for v1
ChrisYangAI 2afc5bd
Merge pull request #7 from RichardoMrMu/chris_traceut_fix
RichardoMrMu daa13c8
reformat
ChrisYangAI bdb8847
reformat
ChrisYangAI 8cf7c88
fix precommit error
ChrisYangAI a2b5346
fix precommit error
ChrisYangAI 57c0df6
Merge pull request #8 from RichardoMrMu/chris_traceut_fix
RichardoMrMu 27d6c69
[CI][Fix] deterministic seed for flaky CI runs on structured outputs …
aarnphm 1d100d0
[CI/Build] Disable flaky test_structured_output tests (#24404)
22quinn a5c7f83
Merge pull request #9 from RichardoMrMu/fix_guidedcodeing_ut_failure
RichardoMrMu 6370955
Merge branch 'main' into fix_conflict
ChrisYangAI bce28cc
fix trace test pipeline config
ChrisYangAI 204a6b7
Merge pull request #10 from RichardoMrMu/fix_trace_test_config
RichardoMrMu e03076c
Merge branch 'main' into fix_conflict
ChrisYangAI 23e74d3
Merge branch 'main' into fix_conflict
ChrisYangAI File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
# ruff: noqa | ||
# type: ignore | ||
from __future__ import annotations | ||
|
||
import threading | ||
from collections.abc import Iterable | ||
from concurrent import futures | ||
from typing import Callable, Generator, Literal | ||
|
||
import grpc | ||
import pytest | ||
from opentelemetry.proto.collector.trace.v1.trace_service_pb2 import ( | ||
ExportTraceServiceResponse) | ||
from opentelemetry.proto.collector.trace.v1.trace_service_pb2_grpc import ( | ||
TraceServiceServicer, add_TraceServiceServicer_to_server) | ||
from opentelemetry.proto.common.v1.common_pb2 import AnyValue, KeyValue | ||
from opentelemetry.sdk.environment_variables import ( | ||
OTEL_EXPORTER_OTLP_TRACES_INSECURE) | ||
|
||
from vllm import LLM, SamplingParams | ||
from vllm.tracing import SpanAttributes | ||
|
||
FAKE_TRACE_SERVER_ADDRESS = "localhost:4317" | ||
|
||
FieldName = Literal['bool_value', 'string_value', 'int_value', 'double_value', | ||
'array_value'] | ||
|
||
|
||
def decode_value(value: AnyValue): | ||
field_decoders: dict[FieldName, Callable] = { | ||
"bool_value": (lambda v: v.bool_value), | ||
"string_value": (lambda v: v.string_value), | ||
"int_value": (lambda v: v.int_value), | ||
"double_value": (lambda v: v.double_value), | ||
"array_value": | ||
(lambda v: [decode_value(item) for item in v.array_value.values]), | ||
} | ||
for field, decoder in field_decoders.items(): | ||
if value.HasField(field): | ||
return decoder(value) | ||
raise ValueError(f"Couldn't decode value: {value}") | ||
|
||
|
||
def decode_attributes(attributes: Iterable[KeyValue]): | ||
return {kv.key: decode_value(kv.value) for kv in attributes} | ||
|
||
|
||
class FakeTraceService(TraceServiceServicer): | ||
|
||
def __init__(self): | ||
self.request = None | ||
self.evt = threading.Event() | ||
|
||
def Export(self, request, context): | ||
self.request = request | ||
self.evt.set() | ||
return ExportTraceServiceResponse() | ||
|
||
|
||
@pytest.fixture | ||
def trace_service() -> Generator[FakeTraceService, None, None]: | ||
"""Fixture to set up a fake gRPC trace service""" | ||
server = grpc.server(futures.ThreadPoolExecutor(max_workers=1)) | ||
service = FakeTraceService() | ||
add_TraceServiceServicer_to_server(service, server) | ||
server.add_insecure_port(FAKE_TRACE_SERVER_ADDRESS) | ||
server.start() | ||
|
||
yield service | ||
|
||
server.stop(None) | ||
|
||
|
||
def test_traces( | ||
monkeypatch: pytest.MonkeyPatch, | ||
trace_service: FakeTraceService, | ||
): | ||
with monkeypatch.context() as m: | ||
m.setenv(OTEL_EXPORTER_OTLP_TRACES_INSECURE, "true") | ||
m.setenv("VLLM_USE_V1", "1") | ||
sampling_params = SamplingParams( | ||
temperature=0.01, | ||
top_p=0.1, | ||
max_tokens=256, | ||
) | ||
model = "facebook/opt-125m" | ||
llm = LLM(model=model, | ||
otlp_traces_endpoint=FAKE_TRACE_SERVER_ADDRESS, | ||
gpu_memory_utilization=0.3, | ||
disable_log_stats=False) | ||
prompts = ["This is a short prompt"] | ||
outputs = llm.generate(prompts, sampling_params=sampling_params) | ||
print(f"test_traces outputs is : {outputs}") | ||
|
||
timeout = 10 | ||
if not trace_service.evt.wait(timeout): | ||
raise TimeoutError( | ||
f"The fake trace service didn't receive a trace within " | ||
f"the {timeout} seconds timeout") | ||
|
||
request = trace_service.request | ||
assert len(request.resource_spans) == 1, ( | ||
f"Expected 1 resource span, " | ||
f"but got {len(request.resource_spans)}") | ||
assert len(request.resource_spans[0].scope_spans) == 1, ( | ||
f"Expected 1 scope span, " | ||
f"but got {len(request.resource_spans[0].scope_spans)}") | ||
assert len(request.resource_spans[0].scope_spans[0].spans) == 1, ( | ||
f"Expected 1 span, " | ||
f"but got {len(request.resource_spans[0].scope_spans[0].spans)}") | ||
|
||
attributes = decode_attributes( | ||
request.resource_spans[0].scope_spans[0].spans[0].attributes) | ||
# assert attributes.get(SpanAttributes.GEN_AI_RESPONSE_MODEL) == model | ||
assert attributes.get( | ||
SpanAttributes.GEN_AI_REQUEST_ID) == outputs[0].request_id | ||
assert attributes.get(SpanAttributes.GEN_AI_REQUEST_TEMPERATURE | ||
) == sampling_params.temperature | ||
assert attributes.get( | ||
SpanAttributes.GEN_AI_REQUEST_TOP_P) == sampling_params.top_p | ||
assert attributes.get(SpanAttributes.GEN_AI_REQUEST_MAX_TOKENS | ||
) == sampling_params.max_tokens | ||
assert attributes.get( | ||
SpanAttributes.GEN_AI_REQUEST_N) == sampling_params.n | ||
assert attributes.get( | ||
SpanAttributes.GEN_AI_USAGE_PROMPT_TOKENS) == len( | ||
outputs[0].prompt_token_ids) | ||
completion_tokens = sum(len(o.token_ids) for o in outputs[0].outputs) | ||
assert attributes.get( | ||
SpanAttributes.GEN_AI_USAGE_COMPLETION_TOKENS) == completion_tokens | ||
|
||
assert attributes.get(SpanAttributes.GEN_AI_LATENCY_TIME_IN_QUEUE) > 0 | ||
assert attributes.get( | ||
SpanAttributes.GEN_AI_LATENCY_TIME_TO_FIRST_TOKEN) > 0 | ||
assert attributes.get(SpanAttributes.GEN_AI_LATENCY_E2E) > 0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.