Skip to content

Commit db35f17

Browse files
committed
Fix vLLM x torch.compile config caching
Fixes #16150 Based on the ModelConfig, we decide if we can reuse an existing torch.compile'd artifact or if we need to recompile. Unfortunately we were not checking enough flags on the config. The problem in #16150 was specifically that if the override_generation_config flag changed then we need to recompile. I went through ModelConfig and I added some more things to be checked for if a model needs to recompile. Disclaimer: I do not know what a lot of these things to do, but I figure that it is better to add things than not (we risk silent incorrectness if the caching is wrong). We can remove more things if we are compiling too much. This is also one of the reasons the PyTorch Team recommend that vLLM use torch.compile's built-in caching (when we improve it), because torch.compile programmatically decides what needs to be cached and we test that really well. Test Plan: - tested locally Signed-off-by: rzou <[email protected]>
1 parent 51baa9c commit db35f17

File tree

1 file changed

+9
-3
lines changed

1 file changed

+9
-3
lines changed

vllm/config.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -294,12 +294,18 @@ def compute_hash(self) -> str:
294294
factors.append(self.quantization)
295295
factors.append(self.revision)
296296
factors.append(self.code_revision)
297+
factors.append(self.max_model_len)
298+
factors.append(self.max_logprobs)
299+
factors.append(self.disable_sliding_window)
297300
factors.append(self.trust_remote_code)
301+
factors.append(self.mm_processor_kwargs)
302+
factors.append(self.generation_config)
303+
factors.append(self.model_impl)
304+
factors.append(self.override_generation_config)
298305
factors.append(self.rope_scaling)
299306
factors.append(self.rope_theta)
300-
# rope cos/sin cache depends on the max_position_embeddings
301-
factors.append(
302-
getattr(self.hf_config, "max_position_embeddings", "None"))
307+
# hf_config can control how the model looks!
308+
factors.append(self.hf_config.to_json_string())
303309
return hashlib.sha256(str(factors).encode()).hexdigest()
304310

305311
def __init__(

0 commit comments

Comments
 (0)