You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[TorchComms + titan] Update ReadME for torchComms repo (#1992)
We already release TorchComms so we want to update the README. Also
there are some issues when it comes to PP degree larger than 2. We also
want to document it.
Copy file name to clipboardExpand all lines: torchtitan/experiments/torchcomms/README.md
+10-2Lines changed: 10 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,14 +4,21 @@
4
4
5
5
This folder provides a framework for composability testing with TorchComms and distributed training in TorchTitan. It enables flexible experimentation with distributed communication primitives and various parallelism strategies in PyTorch.
6
6
7
-
> **TODO:** Additional documentation will be provided once TorchComms is publicly released.
-**Memory Overhead** - TorchComms requires higher peak memory usage. As a workaround, we need to reduce `local_batch_size` to avoid out of memory error.
59
+
-**Pipeline Parallelism** - Pipeline Parallelism is not supported yet when PP degree is larger than 2.
0 commit comments