Commit 92c4ce5
committed
trigger profiling on abort
Summary:
record the profile trace if the training process receives SIGABRT e.g. when Process Group watchdog aborts the process1 parent 6b5517c commit 92c4ce5
File tree
3 files changed
+38
-20
lines changed- torchtitan
- experiments/forge
- tools
3 files changed
+38
-20
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
282 | 282 | | |
283 | 283 | | |
284 | 284 | | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
285 | 291 | | |
286 | | - | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | | - | |
291 | 292 | | |
292 | 293 | | |
293 | 294 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | 21 | | |
23 | 22 | | |
24 | 23 | | |
| |||
68 | 67 | | |
69 | 68 | | |
70 | 69 | | |
71 | | - | |
| 70 | + | |
72 | 71 | | |
73 | 72 | | |
74 | 73 | | |
75 | 74 | | |
76 | 75 | | |
77 | 76 | | |
78 | 77 | | |
79 | | - | |
80 | | - | |
81 | | - | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
82 | 82 | | |
83 | | - | |
84 | | - | |
| 83 | + | |
85 | 84 | | |
86 | 85 | | |
87 | 86 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
7 | 8 | | |
8 | 9 | | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
| |||
33 | 35 | | |
34 | 36 | | |
35 | 37 | | |
| 38 | + | |
| 39 | + | |
36 | 40 | | |
37 | 41 | | |
| 42 | + | |
| 43 | + | |
38 | 44 | | |
39 | 45 | | |
40 | 46 | | |
| |||
613 | 619 | | |
614 | 620 | | |
615 | 621 | | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
616 | 629 | | |
617 | | - | |
618 | | - | |
619 | | - | |
620 | | - | |
621 | | - | |
622 | | - | |
623 | 630 | | |
624 | 631 | | |
625 | 632 | | |
| |||
643 | 650 | | |
644 | 651 | | |
645 | 652 | | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
646 | 662 | | |
647 | 663 | | |
648 | 664 | | |
| |||
666 | 682 | | |
667 | 683 | | |
668 | 684 | | |
669 | | - | |
670 | | - | |
| 685 | + | |
| 686 | + | |
671 | 687 | | |
672 | 688 | | |
673 | 689 | | |
| |||
730 | 746 | | |
731 | 747 | | |
732 | 748 | | |
| 749 | + | |
733 | 750 | | |
734 | 751 | | |
735 | 752 | | |
736 | 753 | | |
| 754 | + | |
737 | 755 | | |
738 | 756 | | |
739 | 757 | | |
| |||
0 commit comments