You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[smpici@c712f6n06 test]$ mpirun -mca pml ob1 -host c712f6n06:1,c712f6n07:1 -np 2 -x LD_LIBRARY_PATH --prefix /nfs_smpi_ci/abd/os/ompi-install/ ./test2
[c712f6n06:14673:0:14673] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xe86300a87d2903a6)
==== backtrace ====
0 /nfs_smpi_ci/abd/os/ucx-install/lib/libucs.so.0(+0x25650) [0x100003915650]
1 /nfs_smpi_ci/abd/os/ucx-install/lib/libucs.so.0(+0x259a4) [0x1000039159a4]
2 [0x100000050478]
3 /nfs_smpi_ci/abd/os/ompi-install/lib/libmpi.so.0(ompi_request_finalize+0x54) [0x1000000e5f24]
4 /nfs_smpi_ci/abd/os/ompi-install/lib/libmpi.so.0(ompi_mpi_finalize+0x990) [0x1000000e8e30]
5 /nfs_smpi_ci/abd/os/ompi-install/lib/libmpi.so.0(PMPI_Finalize+0x44) [0x1000001155c4]
6 ./test2() [0x10000ff8]
7 /lib64/libc.so.6(+0x25200) [0x100000255200]
8 /lib64/libc.so.6(__libc_start_main+0xc4) [0x1000002553f4]
===================
==== backtrace ====
/lib64/libc.so.6(+0x25200)[0x100000255200]
[c712f6n06:14673] [ 6] /lib64/libc.so.6(__libc_start_main+0xc4)[0x1000002553f4]
[c712f6n06:14673] *** End of error message ***
0 /nfs_smpi_ci/abd/os/ucx-install/lib/libucs.so.0(+0x25650) [0x100003915650]
1 /nfs_smpi_ci/abd/os/ucx-install/lib/libucs.so.0(+0x259a4) [0x1000039159a4]
2 [0x100000050478]
3 /nfs_smpi_ci/abd/os/ompi-install/lib/libmpi.so.0(ompi_request_finalize+0x54) [0x1000000e5f24]
4 /nfs_smpi_ci/abd/os/ompi-install/lib/libmpi.so.0(ompi_mpi_finalize+0x990) [0x1000000e8e30]
5 /nfs_smpi_ci/abd/os/ompi-install/lib/libmpi.so.0(PMPI_Finalize+0x44) [0x1000001155c4]
6 ./test2() [0x10000ff8]
7 /lib64/libc.so.6(+0x25200) [0x100000255200]
8 /lib64/libc.so.6(__libc_start_main+0xc4) [0x1000002553f4]
===================
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node c712f6n06 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Some additional info
When I revert this patch the issue is not seen, 0fe756d
At least one issue I guess I'm seeing in the patch is that nbc_req_cons will never be called since the ompi_coll_base_nbc_request_t object is not dynamically allocated, so it seems like req->data.objs.objs is propagating some garbage values.
The text was updated successfully, but these errors were encountered:
AboorvaDevarajan
changed the title
coll/nbc: non blocking collective failswith ompi master
coll/nbc: non blocking collective fails with ompi master
Aug 7, 2019
Background Information
Few of the non blocking collective test cases in ibm suite fails with recent ompi master branch
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
OMPI version : master
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
git clone
Please describe the system on which you are running
Details of the problem
Here is the list of non blocking collective test cases that fails:
Some additional info
When I revert this patch the issue is not seen,
0fe756d
At least one issue I guess I'm seeing in the patch is that
nbc_req_cons
will never be called since theompi_coll_base_nbc_request_t
object is not dynamically allocated, so it seems likereq->data.objs.objs
is propagating some garbage values.The text was updated successfully, but these errors were encountered: