-
Notifications
You must be signed in to change notification settings - Fork 926
Closed
Milestone
Description
I have configured on a Linux/ppc64 system with xlc-13.1 as follows:
[path-to]/configure --prefix=[...] --enable-debug CC=xlc CXX=xlC FC=xlf \
CFLAGS=-q64 --with-wrapper-cflags=-q64 \
CXXFLAGS=-q64 --with-wrapper-cxxflags=-q64 \
FCFLAGS=-q64 --with-wrapper-fcflags=-q64 --disable-oshmem-fortran
This build was previously failing due to #3811, but a patch from @ggouaillardet gets me past that and on to the next problem:
$ mpirun -mca btl sm,self -np 2 examples/ring_c'
[login2:71837] Component file data does not match filename: /gpfs-biou/phh1/OMPI/openmpi-3.0.0rc1-linux-ppc64-xlc-13.1/INST/lib/pmix/mca_ptl_tcp (ptl / tcp) != ptl -- ignored
[login2:71842] Component file data does not match filename: /gpfs-biou/phh1/OMPI/openmpi-3.0.0rc1-linux-ppc64-xlc-13.1/INST/lib/pmix/mca_ptl_tcp (ptl / tcp) != ptl -- ignored
[login2:71843] Component file data does not match filename: /gpfs-biou/phh1/OMPI/openmpi-3.0.0rc1-linux-ppc64-xlc-13.1/INST/lib/pmix/mca_ptl_tcp (ptl / tcp) != ptl -- ignored
[login2:71842] *** Process received signal ***
[login2:71842] Signal: Segmentation fault (11)
[login2:71842] Signal code: Address not mapped (1)
[login2:71842] Failing at address: 0xdeafbeeddeafbf35
[login2:71842] [ 0] [0xfffad2e0448]
[login2:71842] [ 1] [0x0]
[login2:71842] [ 2] /gpfs-biou/phh1/OMPI/openmpi-3.0.0rc1-linux-ppc64-xlc-13.1/INST/lib/libopen-pal.so.40(+0x108fd8)[0xfffac858fd8]
[login2:71842] [ 3] [login2:71843] *** Process received signal ***
[login2:71843] Signal: Segmentation fault (11)
[login2:71843] Signal code: Address not mapped (1)
[login2:71843] Failing at address: 0xdeafbeeddeafbf35
[login2:71843] [ 0] [0xfffaad60448]
[login2:71843] [ 1] [0x0]
Note that the failing address of 0xdeafbeeddeafbf35
is suspiciously similar to
./opal/class/opal_object.h:#define OPAL_OBJ_MAGIC_ID ((0xdeafbeedULL << 32) + 0xdeafbeedULL)
From gdb on a core (different run than output above, may not match exactly):
Core was generated by `examples/ring_c '.
Program terminated with signal 11, Segmentation fault.
#0 0x00000fffa46fc0b4 in pmix_ptl_base_recv_handler (sd=14, flags=2, cbdata=0xfffa4798c78)
at /gpfs-biou/phh1/OMPI/openmpi-3.0.0rc1-linux-ppc64-xlc-13.1/openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/mca/ptl/base/ptl_base_sendrecv.c:401
401 (NULL == peer) ? "NULL" : peer->info->nptr->nspace,
(gdb) where
#0 0x00000fffa46fc0b4 in pmix_ptl_base_recv_handler (sd=14, flags=2, cbdata=0xfffa4798c78)
at /gpfs-biou/phh1/OMPI/openmpi-3.0.0rc1-linux-ppc64-xlc-13.1/openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/mca/ptl/base/ptl_base_sendrecv.c:401
#1 0x00000fffa53b8fd8 in .event_persist_closure ()
from /gpfs-biou/phh1/OMPI/openmpi-3.0.0rc1-linux-ppc64-xlc-13.1/INST/lib/libopen-pal.so.40
#2 0x00000fffa53b9354 in .event_process_active_single_queue ()
from /gpfs-biou/phh1/OMPI/openmpi-3.0.0rc1-linux-ppc64-xlc-13.1/INST/lib/libopen-pal.so.40
#3 0x00000fffa53b9738 in .event_process_active ()
from /gpfs-biou/phh1/OMPI/openmpi-3.0.0rc1-linux-ppc64-xlc-13.1/INST/lib/libopen-pal.so.40
#4 0x00000fffa53ba5e0 in .opal_libevent2022_event_base_loop ()
from /gpfs-biou/phh1/OMPI/openmpi-3.0.0rc1-linux-ppc64-xlc-13.1/INST/lib/libopen-pal.so.40
#5 0x00000fffa46d1d4c in progress_engine (obj=0x100314d1ef8)
at /gpfs-biou/phh1/OMPI/openmpi-3.0.0rc1-linux-ppc64-xlc-13.1/openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/runtime/pmix_progress_threads.c:109
#6 0x000000800cdac5dc in .start_thread () from /lib64/libpthread.so.0
#7 0x000000800ccda9bc in .__clone () from /lib64/libc.so.6