Closed
Description
When compiling openmpi-2.0.1 (or the nightly from last Wednesday) with --enable-btl-openib-failover
we experience a segmentation fault on the first use of MPI communication in all MPI applications when using openib.
[jason0:15469] *** Process received signal ***
[jason0:15469] Signal: Segmentation fault (11)
[jason0:15469] Signal code: Address not mapped (1)
[jason0:15469] Failing at address: (nil)
0 pings 1
[jason0:15469] [ 0] /lib64/libpthread.so.0(+0x10d70)[0x7f40652b2d70]
[jason0:15469] [ 1] /usr/lib64/openmpi/mca_btl_openib.so(mca_btl_openib_sendi+0x66b)[0x7f405a46cceb]
[jason0:15469] [ 2] /usr/lib64/openmpi/mca_pml_ob1.so(+0xae18)[0x7f4059e2ae18]
[jason0:15469] [ 3] /usr/lib64/openmpi/mca_pml_ob1.so(mca_pml_ob1_isend+0x365)[0x7f4059e2b945]
[jason0:15469] [ 4] /usr/lib/libmpi.so.20(ompi_coll_base_barrier_intra_two_procs+0xb5)[0x7f406555b0f5]
[jason0:15469] [ 5] /usr/lib/libmpi.so.20(MPI_Barrier+0xb6)[0x7f4065518576]
[jason0:15469] [ 6] ./pingtest[0x4013a3]
[jason0:15469] [ 7] /lib64/libc.so.6(__libc_start_main+0xf0)[0x7f4064f26620]
[jason0:15469] [ 8] ./pingtest[0x400de9]
[jason0:15469] *** End of error message ***
[jason1:20431] *** Process received signal ***
[jason1:20431] Signal: Segmentation fault (11)
[jason1:20431] Signal code: Address not mapped (1)
[jason1:20431] Failing at address: (nil)
[jason1:20431] [ 0] /lib64/libpthread.so.0(+0x10d70)[0x7f6e12d28d70]
[jason1:20431] [ 1] /usr/lib64/openmpi/mca_btl_openib.so(mca_btl_openib_sendi+0x66b)[0x7f6e07dcbceb]
[jason1:20431] [ 2] /usr/lib64/openmpi/mca_pml_ob1.so(+0xae18)[0x7f6e0c16fe18]
[jason1:20431] [ 3] /usr/lib64/openmpi/mca_pml_ob1.so(mca_pml_ob1_isend+0x365)[0x7f6e0c170945]
[jason1:20431] [ 4] /usr/lib/libmpi.so.20(ompi_coll_base_barrier_intra_two_procs+0xb5)[0x7f6e12fd10f5]
[jason1:20431] [ 5] /usr/lib/libmpi.so.20(MPI_Barrier+0xb6)[0x7f6e12f8e576]
[jason1:20431] [ 6] ./pingtest[0x4013a3]
[jason1:20431] [ 7] /lib64/libc.so.6(__libc_start_main+0xf0)[0x7f6e1299c620]
[jason1:20431] [ 8] ./pingtest[0x400de9]
[jason1:20431] *** End of error message ***
This is a regression from openmpi-10.0.2, where this worked without incident.
When not setting --enable-btl-openib-failover
our setup seems to work again.
In case that matters, we have an mlx4 Infiniband interconnect.
If you need any additional information, please tell me.
Yours,
David