You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
YARN-10393. MR job live lock caused by completed state container leak in heartbeat between node manager and RM. Contributed by zhenzhao wang and Jim Brennan
Copy file name to clipboardExpand all lines: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
+18-2Lines changed: 18 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -645,7 +645,7 @@ public void addCompletedContainer(ContainerId containerId) {
Copy file name to clipboardExpand all lines: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
+7-17Lines changed: 7 additions & 17 deletions
Original file line number
Diff line number
Diff line change
@@ -758,15 +758,11 @@ public NodeHeartbeatResponse nodeHeartbeat(NodeHeartbeatRequest request)
758
758
} elseif (heartBeatID == 2 || heartBeatID == 3) {
759
759
List<ContainerStatus> statuses =
760
760
request.getNodeStatus().getContainersStatuses();
761
-
if (heartBeatID == 2) {
762
-
// NM should send completed containers again, since the last
763
-
// heartbeat is lost.
764
-
Assert.assertEquals(4, statuses.size());
765
-
} else {
766
-
// NM should not send completed containers again, since the last
767
-
// heartbeat is successful.
768
-
Assert.assertEquals(2, statuses.size());
769
-
}
761
+
// NM should send completed containers on heartbeat 2,
762
+
// since heartbeat 1 was lost. It will send them again on
763
+
// heartbeat 3, because it does not clear them if the previous
764
+
// heartbeat was lost in case the RM treated it as a duplicate.
0 commit comments