Commit c5be0ac
Merge VBE output [backend] reland (#5093)
Summary:
Pull Request resolved: #5093
X-link: https://github.com/facebookresearch/FBGEMM/pull/2100
----
# Context on the changes:
Currently, Torchrec merges the outputs of individual VBE TBE ops to be ordered by ranks using [_merge_variable_batch_embeddings](https://www.internalfb.com/code/fbsource/[3bd69d7fa3534144dcb0162ca59803a6c3ff6e70]/fbcode/torchrec/distributed/embedding_lookup.py?lines=593-604). This function seems to cause ~30% QPS regression compared to baseline (HBM+UVM) for Jupiter V1 model with VBE enabled.
To get rid of _merge_variable_batch_embeddings() function, we pre-allocate the `vbe_output` tensor which holds outputs from all VBE ops and calculate `vbe_output_offsets` to allow each individual VBE ops to write to the correct location in the `vbe_output` tensor.
By default, `vbe_output` and `vbe_output_offsets` are `None`, which means VBE ops will return individual tensor the way it currently does. The feature is enabled when `vbe_output` and `vbe_output_offsets` are not `None`.
---
**NOTE**
1. This feature is currently supported for Sparse TBE.
2. The support is limited for CUDA.
3. For backward compatibility, we append the newly introduced `vbe_output` to the existing API. Hence, we need to make the `vbe_output` tensor as `optional` with default value as `None` (there's no default value for Tensor).
4. We *cannot* annotate `vbe_output` because PyTorch registration does not support annotation of optional tensor. Adding annotation will incur the following error below. This may cause some issues to support this on MTIA, if MTIA relies on tensor annotation.
```
E0903 09:50:32.966235 2850885 ExceptionTracer.cpp:227] exception stack complete
terminate called after throwing an instance of 'std::runtime_error'
what(): expected ident but found '(' here:
split_embedding_codegen_lookup_adagrad_function_pt2( Tensor placeholder_autograd_tensor, Tensor[](a!) weights, Tensor D_offsets, SymInt total_D, SymInt max_D, Tensor hash_size_cumsum, int total_hash_size_bits, Tensor indices, Tensor offsets, int pooling_mode, Tensor? indice_weights, Tensor? feature_requires_grad, int output_dtype, Tensor?[](e!) aux_tensor, int[] aux_int, float[] aux_float, bool[] aux_bool, Tensor[](g!) momentum1, Tensor learning_rate_tensor, float[] optim_float, SymInt max_B=-1, SymInt max_B_feature_rank=-1, SymInt vbe_output_size=-1, Tensor?(t!) vbe_output=None ) -> Tensor
~ <--- HERE
```
See https://docs.google.com/document/d/1h5YyeCjYmmN-CIFB98CrBf1uMksidPbNvM1rl8yZeds/edit?tab=t.0#heading=h.tdfkkc6ujdyl
----
This diff is a reland of D79704318 which all issues have been addressed.
## 1) pyper validation test
D79704318 was reverted as it broke pyper validation test (frontend/backend package compatibility issue), which blocks pyper releases. The issue is addressed in this diff.
Context: In pyper, changes in python would be included in frontend package (e.g., ads_dper3) and C++ in backend package (e.g., training_platform). If the diff contains both python and C++, there's a chance that some model will use mismatching packages. In other words, frontend package does not include the diff but backend does, and vice versa.
D83881544 is only enabling backend support (i.e., no one can actually use this feature, so TBE VBE will work as usual). Due to new Unified API changes, we need to pipeline optional tensor from frontend and requires python change.
Denote
- #0 as no D83881544 included
- #1 as D83881544 included
There are 4 scenarios:
(1) frontend #0 + old backend #0 - no issue
(2) frontend #1 + backend #1 - no issue
(3) frontend #0 + backend #1 - handled; TBE VBE will work normally.
(4) frontend #1 + backend #0 - no issue; the diff added warning that backend is old
There's another diff D79869613 in the stack that will enable frontend support (i.e., allow users to use this feature), which will go into __frontend package only__.
Now, the 1)-4) scenarios would remain the same, but new scenarios occur.
Denote
- #2 as D79869613 included
(5) frontend #2 + backend #1 - no issue, same as (2).
(6) frontend #2 (no feature enabled) + backend #0 - same as (4).
(7) frontend #2 (feature enabled) + backend #0 - **assertion error due to no backend support**, to prevent silent wrong behavior.
**To use the feature, this diff stack (D83881544 and D79869613) need to be included in both frontend and backend package.**
## 2) SEV
D79704318 caused SEV due to TBE v1 and v2 interfacing compatibility issue on lex_ig_o3_package. Unit tests to ensure v1 compatibility was added D83020965.
D83881544 passes the v1 compatibility test.
Detail on the root cause and fix:
https://docs.google.com/document/d/1XcYNfyiAn4aRFvjV0QG5aLiWKuuOWtJdLOMKNszZRpI/edit?tab=t.0#heading=h.psr4a2qn0mdk
------
Reviewed By: q10, renganxu
Differential Revision: D83881544
fbshipit-source-id: 5d63841bbf79a72219903e9d0f77ee3b998bc1051 parent 7f4a9d2 commit c5be0ac
File tree
14 files changed
+255
-66
lines changed- fbgemm_gpu
- codegen
- genscript
- training
- backward
- forward
- pt2
- python
- include/fbgemm_gpu
- src/split_embeddings_utils
14 files changed
+255
-66
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
422 | 422 | | |
423 | 423 | | |
424 | 424 | | |
| 425 | + | |
425 | 426 | | |
426 | 427 | | |
427 | 428 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
76 | | - | |
77 | | - | |
78 | | - | |
| 76 | + | |
79 | 77 | | |
80 | 78 | | |
81 | 79 | | |
| |||
Lines changed: 13 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
112 | 116 | | |
| 117 | + | |
113 | 118 | | |
114 | 119 | | |
115 | 120 | | |
| |||
474 | 479 | | |
475 | 480 | | |
476 | 481 | | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
477 | 486 | | |
| 487 | + | |
478 | 488 | | |
479 | 489 | | |
480 | 490 | | |
| |||
708 | 718 | | |
709 | 719 | | |
710 | 720 | | |
711 | | - | |
| 721 | + | |
712 | 722 | | |
713 | 723 | | |
714 | 724 | | |
| |||
729 | 739 | | |
730 | 740 | | |
731 | 741 | | |
732 | | - | |
| 742 | + | |
| 743 | + | |
733 | 744 | | |
734 | 745 | | |
735 | 746 | | |
| |||
Lines changed: 7 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| |||
103 | 104 | | |
104 | 105 | | |
105 | 106 | | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
106 | 111 | | |
| 112 | + | |
107 | 113 | | |
108 | 114 | | |
109 | 115 | | |
| |||
210 | 216 | | |
211 | 217 | | |
212 | 218 | | |
213 | | - | |
| 219 | + | |
Lines changed: 26 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
10 | 9 | | |
11 | 10 | | |
12 | 11 | | |
| 12 | + | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| |||
391 | 391 | | |
392 | 392 | | |
393 | 393 | | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
394 | 398 | | |
| 399 | + | |
395 | 400 | | |
396 | 401 | | |
397 | 402 | | |
| |||
529 | 534 | | |
530 | 535 | | |
531 | 536 | | |
532 | | - | |
| 537 | + | |
533 | 538 | | |
534 | 539 | | |
535 | 540 | | |
536 | | - | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
537 | 555 | | |
538 | 556 | | |
539 | 557 | | |
| |||
877 | 895 | | |
878 | 896 | | |
879 | 897 | | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
880 | 902 | | |
| 903 | + | |
881 | 904 | | |
882 | 905 | | |
883 | 906 | | |
| |||
0 commit comments