-
Notifications
You must be signed in to change notification settings - Fork 936
Closed
Description
Running Open MPI 4.0.1 in combination with Open UCX 1.5 I am seeing my application hang while one process attempts to release an exclusive lock while the target attempts to acquire a shared lock. The code below can be used to reproduce the issue (tested on our IB cluster):
#include <stdio.h>
#include <mpi.h>
int main(int argc, char **argv)
{
MPI_Win win;
int elem_per_unit = 1;
int *baseptr;
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Win_allocate(
elem_per_unit*sizeof(int), 1, MPI_INFO_NULL,
MPI_COMM_WORLD, &baseptr, &win);
if (size == 2) {
// get exclusive lock
if (rank != 0) {
int val;
printf("[%d] Acquiring exclusive lock\n", rank);
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win);
MPI_Put(&val, 1, MPI_INT, 0, 0, 1, MPI_INT, win);
MPI_Win_flush(0, win);
}
MPI_Barrier(MPI_COMM_WORLD);
// release exclusive lock
if (rank != 0) {
printf("[%d] Releasing exclusive lock\n", rank);
// Rank 1 hangs here
MPI_Win_unlock(0, win);
}
}
// Rank 0 hangs here
printf("[%d] Acquiring shared lock\n", rank);
MPI_Win_lock_all(0, win);
MPI_Win_unlock_all(win);
MPI_Win_free(&win);
MPI_Finalize();
return 0;
}
Build with:
$ mpicc mpi_shared_excl_lock.c -o mpi_shared_excl_lock
Run with:
$ mpirun -n 2 -N 1 ./mpi_shared_excl_lock
[1] Acquiring exclusive lock
[1] Releasing exclusive lock
[0] Acquiring shared lock
Interestingly, leaving out the barrier between acquiring and releasing the lock lets the example run successfully. Also, things run fine when using Open IB instead of UCX.