Skip to content

Commit aa4529b

Browse files
AboorvaDevarajanawlauria
authored andcommitted
ompi/request: Add a read memory barrier to sync the receive buffer soon after wait completes.
We found an issue where with using multiple threads, it is possible for the data to not be in the buffer before MPI_Wait() returns. Testing the buffer later after MPI_Wait() returned would show the data arrives eventually without the rmb(). We have seen this issue on Power9 intermittently using PAMI, but in theory could happen with any transport. Signed-off-by: Austen Lauria <[email protected]> (cherry picked from commit 12192f1)
1 parent a39a051 commit aa4529b

File tree

1 file changed

+15
-12
lines changed

1 file changed

+15
-12
lines changed

ompi/request/request.h

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -417,21 +417,24 @@ static inline int ompi_request_free(ompi_request_t** request)
417417

418418
static inline void ompi_request_wait_completion(ompi_request_t *req)
419419
{
420-
if (opal_using_threads () && !REQUEST_COMPLETE(req)) {
421-
void *_tmp_ptr = REQUEST_PENDING;
422-
ompi_wait_sync_t sync;
420+
if (opal_using_threads ()) {
421+
if(!REQUEST_COMPLETE(req)) {
422+
void *_tmp_ptr = REQUEST_PENDING;
423+
ompi_wait_sync_t sync;
423424

424-
WAIT_SYNC_INIT(&sync, 1);
425+
WAIT_SYNC_INIT(&sync, 1);
425426

426-
if (OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR(&req->req_complete, &_tmp_ptr, &sync)) {
427-
SYNC_WAIT(&sync);
428-
} else {
429-
/* completed before we had a chance to swap in the sync object */
430-
WAIT_SYNC_SIGNALLED(&sync);
431-
}
427+
if (OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR(&req->req_complete, &_tmp_ptr, &sync)) {
428+
SYNC_WAIT(&sync);
429+
} else {
430+
/* completed before we had a chance to swap in the sync object */
431+
WAIT_SYNC_SIGNALLED(&sync);
432+
}
432433

433-
assert(REQUEST_COMPLETE(req));
434-
WAIT_SYNC_RELEASE(&sync);
434+
assert(REQUEST_COMPLETE(req));
435+
WAIT_SYNC_RELEASE(&sync);
436+
}
437+
opal_atomic_rmb();
435438
} else {
436439
while(!REQUEST_COMPLETE(req)) {
437440
opal_progress();

0 commit comments

Comments
 (0)