Skip to content

Shared memory not working with spawned process #6823

@rodarima

Description

@rodarima

After launching a new process with MPI_Comm_spawn, the call to allocate a shared memory region fails:

[mio:03625] *** An error occurred in MPI_Win_allocate_shared
[mio:03625] *** reported by process [3230072833,0]
[mio:03625] *** on communicator 
[mio:03625] *** MPI_ERR_RMA_SHARED: Memory cannot be shared
[mio:03625] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[mio:03625] ***    and potentially your MPI job)
[mio:03620] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[mio:03620] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

The following reproducer works with Intel MPI, buf fails with OpenMPI.

Parent code:

#define _GNU_SOURCE
#include <sched.h>

#include <stdio.h>
#include <mpi.h>
#include <stdlib.h>
#include <unistd.h>
#include <assert.h>

int
main(int argc, char *argv[])
{
	int provided, n, rank, size;
	MPI_Comm intercomm, universe;
	MPI_Win win;
	int *buf;
	MPI_Aint bufsize;
	int disp = sizeof(int);
	int k;

	MPI_Init_thread(NULL, NULL, MPI_THREAD_MULTIPLE, &provided);
	assert(provided == MPI_THREAD_MULTIPLE);

	MPI_Comm_size(MPI_COMM_WORLD, &size);

	MPI_Comm_spawn("./worker", MPI_ARGV_NULL, 1,
		MPI_INFO_NULL, 0, MPI_COMM_WORLD,
		&intercomm, MPI_ERRCODES_IGNORE);

	MPI_Intercomm_merge(intercomm, 0, &universe);
	MPI_Comm_size(universe, &k);
	assert(k == 2);

	bufsize = sizeof(int);

	MPI_Win_allocate_shared(bufsize, 1, MPI_INFO_NULL, universe, &buf, &win);

	buf[0] = 666;

	MPI_Barrier(universe);

	/* Worker runs now */

	MPI_Barrier(universe);

	assert(buf[0] == 555);
	MPI_Finalize();

	return 0;
}

Worker code:

#include <mpi.h>
#include <stddef.h>
#include <assert.h>


int
main(int argc, char *argv[])
{
	MPI_Comm parent, universe;
	int *buf;
	size_t bufsize;
	int rank, disp;
	char hostname[100];
	MPI_Win win;
	MPI_Aint asize;

	MPI_Init(&argc, &argv);

	MPI_Comm_get_parent(&parent);

	assert(parent != MPI_COMM_NULL);

	MPI_Intercomm_merge(parent, 0, &universe);


	MPI_Win_allocate_shared(0, 1, MPI_INFO_NULL, universe, &buf, &win);
	MPI_Win_shared_query(win, MPI_PROC_NULL, &asize, &disp, &buf);

	MPI_Barrier(universe);

	assert(buf[0] == 666);

	buf[0] = 555;

	MPI_Barrier(universe);

	MPI_Finalize();
	return 0;
}

Makefile:

CC=mpicc

CFLAGS=-g -O0

BIN=main worker

all: $(BIN)

clean:
	rm -f $(BIN) *.o

To run, I use mpirun --oversubscribe -n 1 ./main.

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

Release v4.0.1

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Arch Linux package install, build date Fri 29 Mar 2019.

Please describe the system on which you are running

  • Operating system/version: 5.1.16-arch1-1-ARCH

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions