Skip to content

Singleton MPI initialization and spawn #10590

Closed
@dalcinl

Description

@dalcinl

Looks like singleton MPI init and spawn is broken in the main branch.

Look at this reproducer:

#include <mpi.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
  MPI_Comm parent, intercomm;

  MPI_Init(NULL, NULL);

  MPI_Comm_get_parent(&parent);
  if (MPI_COMM_NULL != parent)
    MPI_Comm_disconnect(&parent);
  
  if (argc > 1) {
    printf("Spawning '%s' ... ", argv[1]);
    MPI_Comm_spawn(argv[1], MPI_ARGV_NULL,
                   1, MPI_INFO_NULL, 0, MPI_COMM_SELF,
                   &intercomm, MPI_ERRCODES_IGNORE);
    MPI_Comm_disconnect(&intercomm);
    printf("OK\n");
  }

  MPI_Finalize();
}

Now I run that code using Open MPI v4.1.2 (system package from Fedora 36) the following two ways:

$ mpiexec -n 1 ./a.out ./a.out 
Spawning './a.out' ... OK

$ ./a.out ./a.out 
Spawning './a.out' ... OK

Note that the second way does not use mpiexec (that is, what the MPI standard calls singleton MPI initialization).

Next I run the code with ompi/main. I've configured with:

./configure \
    --without-ofi \
    --without-ucx \
    --with-pmix=internal \
    --with-prrte=internal \
    --with-libevent=internal \
    --with-hwloc=internal \
    --enable-debug \
    --enable-mem-debug \
    --disable-man-pages \
    --disable-sphinx

The first way (using mpiexec) seems to works just fine. The second way (singleton MPI init) fails:

$ mpiexec -n 1 ./a.out ./a.out 
Spawning './a.out' ... OK

$ ./a.out ./a.out 
[kw61149:1105609] OPAL ERROR: Error in file ../../ompi/dpm/dpm.c at line 2122
[kw61149:00000] *** An error occurred in MPI_Comm_spawn
[kw61149:00000] *** reported by process [440139776,0]
[kw61149:00000] *** on communicator MPI_COMM_SELF
[kw61149:00000] *** MPI_ERR_UNKNOWN: unknown error
[kw61149:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[kw61149:00000] ***    and MPI will try to terminate your MPI job as well)

PS: Lack of singleton MPI initialization complicate some Python users wanting to dynamically spawn MPI processes as needed via mpi4py without requiring the parent process to be launched through mpiexec.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions