Commit db35f17
committed
Fix vLLM x torch.compile config caching
Fixes #16150
Based on the ModelConfig, we decide if we can reuse an existing
torch.compile'd artifact or if we need to recompile. Unfortunately we
were not checking enough flags on the config.
The problem in #16150 was specifically that if the
override_generation_config flag changed then we need to recompile.
I went through ModelConfig and I added some more things to be checked
for if a model needs to recompile. Disclaimer: I do not know what a lot
of these things to do, but I figure that it is better to add things
than not (we risk silent incorrectness if the caching is wrong).
We can remove more things if we are compiling too much.
This is also one of the reasons the PyTorch Team recommend that vLLM use
torch.compile's built-in caching (when we improve it), because torch.compile
programmatically decides what needs to be cached and we test that really
well.
Test Plan:
- tested locally
Signed-off-by: rzou <[email protected]>1 parent 51baa9c commit db35f17
1 file changed
+9
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
294 | 294 | | |
295 | 295 | | |
296 | 296 | | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
297 | 300 | | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
298 | 305 | | |
299 | 306 | | |
300 | | - | |
301 | | - | |
302 | | - | |
| 307 | + | |
| 308 | + | |
303 | 309 | | |
304 | 310 | | |
305 | 311 | | |
| |||
0 commit comments