Commit b229455
Update placement utils and weights to handle meta device (pytorch#162842)
Summary:
This diff fixes two things which come up when testing a tgif-published pt2 model remote net:
1) Updates isSameDevice to handle meta device to avoid this error:
```
what(): Unsupported device typemeta and meta
Exception raised from isSameDevice at fbcode/caffe2/torch/nativert/executor/PlacementUtils.cpp:20
```
2. Updates xl weight v2 loading logic in Weights.cpp to handle non-TBE xl-weights. Today, we enforce the device is the same for an old weight and new weight when replacing with ModelRunnerAdapter.setAttr(). However, the way we replace non-TBE xl weights is to find any weights on "meta" device and then replace them with their correct weight with real device from xl_weights folder. Therefore, the new weight and old weight will always have different devices and the device check is invalid. I don't think we've run into this so far bc non-TBE xl weights have not been thoroughly tested until now.
Test Plan:
Run MRS you model merge net, which uses non-TBE xl weights. Confirm that before change #1 we get error:
```
Unsupported device typemeta and meta
```
Then after change #1 and before change #2 we get:
```
what(): Mismatched device for merge.user_tower.linear.weight: meta vs cpu
Exception raised from validateValue at fbcode/caffe2/torch/nativert/executor/Weights.cpp:374
```
After change run is successful
Command:
```
MODEL_ENTITY_ID=921242082
SNAPSHOT_ID=1269
module_name=merge
SAMPLE_INPUT_DIR=/data/users/georgiaphillips/models/921242082/${SNAPSHOT_ID}/${module_name}_archive/package/data/sample_inputs
buck2 run mode/dev-nosan -c fbcode.nvcc_arch=h100,a100 -c fbcode.enable_gpu_sections=true caffe2/torch/fb/model_transform/fx2trt/packaging:load_net_predictor -- --loadMode=Benchmark --inputNetFile=/data/users/$USER/models/${MODEL_ENTITY_ID}/${SNAPSHOT_ID}/${MODEL_ENTITY_ID}_${SNAPSHOT_ID}.predictor.${module_name} --moduleName=${module_name} --submodToDevice="merge|cuda0" --benchmarkEnableProfiling=false --disableStaticRuntime=true --doNotRandomizeSampleInputs=true --benchmarkDontRebatchSamples=true --pytorch_predictor_sigmoid_static_dispatch_enable=false --pytorch_predictor_sigmoid_graph_passes_enable=false --sampleInputFilePath=${SAMPLE_INPUT_DIR}/${module_name}.pt
```
Rollback Plan:
Differential Revision: D80713052
Pull Request resolved: pytorch#162842
Approved by: https://github.com/henryoier1 parent a541974 commit b229455
File tree
3 files changed
+39
-12
lines changed- torch/nativert/executor
3 files changed
+39
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
20 | 23 | | |
21 | 24 | | |
22 | 25 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
337 | 337 | | |
338 | 338 | | |
339 | 339 | | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
340 | 347 | | |
341 | 348 | | |
342 | 349 | | |
| |||
360 | 367 | | |
361 | 368 | | |
362 | 369 | | |
363 | | - | |
364 | | - | |
365 | | - | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
366 | 383 | | |
367 | | - | |
368 | | - | |
369 | | - | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | | - | |
374 | | - | |
375 | 384 | | |
376 | 385 | | |
377 | 386 | | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
378 | 394 | | |
379 | | - | |
| 395 | + | |
380 | 396 | | |
381 | 397 | | |
382 | 398 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
69 | 73 | | |
70 | 74 | | |
71 | 75 | | |
| |||
77 | 81 | | |
78 | 82 | | |
79 | 83 | | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
80 | 88 | | |
81 | 89 | | |
82 | 90 | | |
| |||
0 commit comments