Skip to content

Commit 31799ec

Browse files
author
rhc54
authored
Merge pull request #2298 from rhc54/topic/notify
When mpirun operates in --continuous mode, we won't terminate the job…
2 parents 2076622 + d031946 commit 31799ec

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

orte/mca/errmgr/default_hnp/errmgr_default_hnp.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -428,6 +428,13 @@ static void proc_errors(int fd, short args, void *cbdata)
428428
if (orte_get_attribute(&jdata->attributes, ORTE_JOB_CONTINUOUS_OP, NULL, OPAL_BOOL)) {
429429
/* always mark the waitpid as having fired */
430430
ORTE_ACTIVATE_PROC_STATE(&pptr->name, ORTE_PROC_STATE_WAITPID_FIRED);
431+
/* if this is a remote proc, we won't hear anything more about it
432+
* as the default behavior would be to terminate the job. So be sure to
433+
* mark the IOF as having completed too so we correctly mark this proc
434+
* as dead and notify everyone as required */
435+
if (!ORTE_FLAG_TEST(pptr, ORTE_PROC_FLAG_LOCAL)) {
436+
ORTE_ACTIVATE_PROC_STATE(&pptr->name, ORTE_PROC_STATE_IOF_COMPLETE);
437+
}
431438
goto cleanup;
432439
}
433440

0 commit comments

Comments
 (0)