Commit 3c18115
committed
Optimize vLLM weight reloading using collective_rpc
Use vLLM's collective_rpc API to reload weights without recreating the
entire engine. This provides significant performance improvements:
- Weight reload: ~0.7-0.9s (vs ~7-10s for full engine recreation)
- Preserves KV cache, kernels, and memory allocations
- Reduces memory fragmentation
Changes:
- Update VLLMRolloutEngine.update_weights() to use
collective_rpc("reload_weights") instead of recreating engine
The reload mechanism saves updated weights to disk, then calls
reload_weights() on all workers via RPC, maintaining bitwise
determinism while avoiding expensive engine recreation.
Note: Requires VLLM_ALLOW_INSECURE_SERIALIZATION=1 environment
variable for collective_rpc with custom functions.1 parent 62b20b6 commit 3c18115
1 file changed
+4
-14
lines changedLines changed: 4 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
145 | 145 | | |
146 | 146 | | |
147 | 147 | | |
| 148 | + | |
148 | 149 | | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
163 | 153 | | |
164 | 154 | | |
165 | 155 | | |
| |||
0 commit comments