-
Notifications
You must be signed in to change notification settings - Fork 935
Ensure we fail if remote nodes cannot find executable #4794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jjhursey
commented
Feb 6, 2018
- Ref PR Ensure we fail if remote nodes cannot find executable #4792
Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit ce901ba)
|
@bwbarrett discussed at the webex today and think this should be pulled in before the release. We also need b643852 |
|
It didn't apply cleanly, so I'll have to investigate a bit. I'll try to get it done today. |
|
@bosilca I could not reproduce the hang with this branch, but I reproduce a different issue related to failed executions. From the trace below you can see that if a job fails to launch it still retains it's allocation. So eventually the DVM runs out of available slots and starts rejecting submissions. I did not see this in testing the v3.1.x branch. |
|
From testing, it doesn't seem this this patch is needed in the v3.0.x series. Closing this PR. |