Skip to content

Commit 14929a4

Browse files
committed
Log how much time loading a compiled artifact takes
When people say they don't like the "torch.compile startup time", they mean two things: 1) the cold start time 2) the warm start time (when the vLLM disk cache has already been populated). We had logging for (1), we didn't have (2). This PR adds (2) Test Plan: I ran `VLLM_USE_V1=1 python benchmark_latency.py --model meta-llama/Meta-Llama-3-8B --batch-size 1 -O '{"level": 3, "compile_sizes": {1, 2}}'` And observed the following logs: ``` INFO 04-18 08:26:11 [backends.py:431] Dynamo bytecode transform time: 5.03 s INFO 04-18 08:26:15 [backends.py:120] Directly load the compiled graph(s) for shape None from the cache, took 4.190 s INFO 04-18 08:26:18 [kv_cache_utils.py:634] GPU KV cache size: 532,032 tokens ``` Side note: it's probably not good that loading from the cache takes 4 seconds? Signed-off-by: rzou <[email protected]>
1 parent 205d84a commit 14929a4

File tree

1 file changed

+8
-4
lines changed

1 file changed

+8
-4
lines changed

vllm/compilation/backends.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -110,10 +110,14 @@ def compile(self,
110110
compiled_graph = self.load(graph, example_inputs, graph_index,
111111
runtime_shape)
112112
if compiled_graph is not None:
113-
if graph_index == 0:
114-
# adds some info logging for the first graph
115-
logger.info("Directly load the compiled graph for shape %s "
116-
"from the cache", str(runtime_shape)) # noqa
113+
if graph_index == num_graphs - 1:
114+
# after loading the last graph for this shape, record the time.
115+
# there can be multiple graphs due to piecewise compilation.
116+
now = time.time()
117+
elapsed = now - compilation_start_time
118+
logger.info(
119+
"Directly load the compiled graph(s) for shape %s "
120+
"from the cache, took %.3f s", str(runtime_shape), elapsed)
117121
return compiled_graph
118122

119123
# no compiler cached the graph, or the cache is disabled,

0 commit comments

Comments
 (0)