Skip to content

coll-portals4: retry PtlMEUnlink() if PTL_IN_USE #5500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tkordenbrock
Copy link
Member

In the cleanup phase, it is possible for PtlMEUnlink() to return
PTL_IN_USE if the NIC is not done with the ME. This should not
be considered an error. This commit adds a retry loop around
PtlMEUnlink().

In some cases, the return value of PtlMEUnlink() and PtlCTFree()
was not checked at all. Check them with the same retry loop as
above.

Signed-off-by: Todd Kordenbrock [email protected]

In the cleanup phase, it is possible for PtlMEUnlink() to return
PTL_IN_USE if the NIC is not done with the ME.  This should not
be considered an error.  This commit adds a retry loop around
PtlMEUnlink().

In some cases, the return value of PtlMEUnlink() and PtlCTFree()
was not checked at all.  Check them with the same retry loop as
above.

Signed-off-by: Todd Kordenbrock <[email protected]>
Copy link
Contributor

@regrant regrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct and looks good. Agreed this should not be an error, but a temporary situation that must eventually return successfully under the progression model (assuming the collective itself is not designed incorrectly).

@tkordenbrock tkordenbrock merged commit e9f378e into open-mpi:master Aug 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants