Closed
Description
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
git branch v5.0.x
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
mkdir BUILD
cd BUILD
../configure --prefix=/home/devel/mpi/openmpi/5.0.0 --without-ofi --without-ucx --with-pmix=internal --enable-debug --enable-mem-debug --disable-man-pages --disable-sphinx && make -j 32 install
NOTE: I'm configuring without ofi and ucx, pmix is internal, hwloc is from system Fedora 36 version 2.5.0
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
$ git submodule status
8c39d8e6a95d6fa78e765b5e86c324bd8a4ecd56 3rd-party/openpmix (v4.1.2-58-g8c39d8e6)
f75647a0518b5a476011f543200fca1cf8600cb8 3rd-party/prrte (v2.0.2-99-gf75647a051)
Please describe the system on which you are running
- Operating system/version: Linux 5.17.6 (Fedora 35)
- Computer hardware: AMD Ryzen Threadripper PRO 3995WX 64-Cores
- Network type: isolated
Details of the problem
I could not find any Open MPI-specific test for MPI-4 partitioned communication. Therefore, I wrote my own trivial reproducer:
#include <mpi.h>
int main(int argc, char *argv[])
{
int buf[1] = {0};
MPI_Request sreq,rreq;
MPI_Init(&argc, &argv);
MPI_Psend_init(buf, 1, 1, MPI_INT, 0, 0, MPI_COMM_SELF, MPI_INFO_NULL, &sreq);
MPI_Precv_init(buf, 1, 1, MPI_INT, 0, 0, MPI_COMM_SELF, MPI_INFO_NULL, &rreq);
MPI_Request_free(&sreq);
MPI_Request_free(&rreq);
MPI_Finalize();
return 0;
}
$ mpicc --version
gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ mpicc -g3 tmp.c
$ ./a.out
[localhost:2807591] *** Process received signal ***
[localhost:2807591] Signal: Segmentation fault (11)
[localhost:2807591] Signal code: Address not mapped (1)
[localhost:2807591] Failing at address: (nil)
[localhost:2807591] [ 0] /lib64/libc.so.6(+0x55e30)[0x7f67508efe30]
[localhost:2807591] *** End of error message ***
Segmentation fault (core dumped)
Valgrind is not helpful:
$ valgrind -q ./a.out
==2807606== Jump to the invalid address stated on the next line
==2807606== at 0x0: ???
==2807606== by 0x4011C8: main (tmp.c:10)
==2807606== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==2807606==
[localhost:2807606] *** Process received signal ***
[localhost:2807606] Signal: Segmentation fault (11)
[localhost:2807606] Signal code: Invalid permissions (2)
[localhost:2807606] Failing at address: (nil)
[localhost:2807606] [ 0] /lib64/libc.so.6(+0x55e30)[0x4d71e30]
[localhost:2807606] *** End of error message ***
Segmentation fault (core dumped)
I cannot get why the jump address is 0x0. The symbol is definitely in the library:
$ nm /home/devel/mpi/openmpi/5.0.0/lib/libmpi.so | grep Psend_init
0000000000133ac5 W MPI_Psend_init
0000000000133ac5 T PMPI_Psend_init
Perhaps a compiler bug?