Skip to content

mtl-portals4: in rendezvous, reissue PtlGet() if it fails #3528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

tkordenbrock
Copy link
Member

This commit fixes a race condition in the rendezvous protocol. The
race occurs because the sender does not wait for the link event on the
send buffer. Even though this has not been seen in the wild, it is
possible for the receiver to issue the PtlGet() before the ME is
linked which causes a NAK at the receiver. This commit resolves this
race by reissuing the PtlGet() when a NAK occurs.

Closes #173.

@regrant - Please review.

This commit fixes a race condition in the rendezvous protocol.  The
race occurs because the sender does not wait for the link event on the
send buffer.  Even though this has not been seen in the wild, it is
possible for the receiver to issue the PtlGet() before the ME is
linked which causes a NAK at the receiver.  This commit resolves this
race by reissuing the PtlGet() when a NAK occurs.

Signed-off-by: Todd Kordenbrock <[email protected]>
@tkordenbrock
Copy link
Member Author

bot:retest

@regrant
Copy link
Contributor

regrant commented May 17, 2017

👍 Reviewed and it looks good. I'll merge it into master.

@regrant regrant merged commit b59eb76 into open-mpi:master May 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Portals 4 get race
2 participants