-
Notifications
You must be signed in to change notification settings - Fork 921
Closed
Labels
Description
Absoft's MTT has been failing on the dlopen_test
(during make check
) in opal_init()
on master recently (e.g., see https://mtt.open-mpi.org/index.php?do_redir=2438).
Here's a backtrace provided by @cagoelz (note the assertion failure in opal_pointer_array_add()
):
Reading symbols from /home/ompitest/scratches/2017-05-05/mpi-install/8HsK/src/openmpi-master-201705050239-eb03679/ompi/debuggers/.libs/lt-dlopen_test...done.
(gdb) run
Starting program: /home/ompitest/scratches/2017-05-05/mpi-install/8HsK/src/openmpi-master-201705050239-eb03679/ompi/debuggers/.libs/lt-dlopen_test
[Thread debugging using libthread_db enabled]
lt-dlopen_test: class/opal_pointer_array.c:241: opal_pointer_array_add: Assertion `0 == (table->free_bits[__b_idx] & (1UL << __b_pos))' failed.
Program received signal SIGABRT, Aborted.
0x00110402 in __kernel_vsyscall ()
(gdb) bt
#0 0x00110402 in __kernel_vsyscall ()
#1 0x00733b10 in raise () from /lib/libc.so.6
#2 0x00735421 in abort () from /lib/libc.so.6
#3 0x0072cf6b in __assert_fail () from /lib/libc.so.6
#4 0x00399ace in opal_pointer_array_add (table=0x482940, ptr=0x80637f0)
at class/opal_pointer_array.c:241
#5 0x003d3e13 in register_variable (project_name=0x45b094 "opal",
framework_name=0x45b090 "dss", component_name=0x0,
variable_name=0x45b1a9 "buffer_threshold_size", description=0x0,
type=MCA_BASE_VAR_TYPE_INT, enumerator=0x0, bind=0,
flags=MCA_BASE_VAR_FLAG_SETTABLE, info_lvl=OPAL_INFO_LVL_8,
scope=MCA_BASE_VAR_SCOPE_ALL_EQ, synonym_for=-1, storage=0x47f928)
at mca_base_var.c:1385
#6 0x003d4649 in mca_base_var_register (project_name=0x45b094 "opal",
framework_name=0x45b090 "dss", component_name=0x0,
variable_name=0x45b1a9 "buffer_threshold_size", description=0x0,
type=MCA_BASE_VAR_TYPE_INT, enumerator=0x0, bind=0,
flags=MCA_BASE_VAR_FLAG_SETTABLE, info_lvl=OPAL_INFO_LVL_8,
scope=MCA_BASE_VAR_SCOPE_ALL_EQ, storage=0x47f928) at mca_base_var.c:1495
#7 0x003b582f in opal_dss_register_vars () at dss/dss_open_close.c:286
#8 0x0039fe25 in opal_register_params () at runtime/opal_params.c:357
#9 0x0039ec92 in opal_init_util (pargc=0xbfffead0, pargv=0xbfffead4)
at runtime/opal_init.c:422
#10 0x0039eeb1 in opal_init (pargc=0xbfffead0, pargv=0xbfffead4)
---Type <return> to continue, or q <return> to quit---
at runtime/opal_init.c:513
#11 0x08048cee in main (argc=Cannot access memory at address 0x77fc) at dlopen_test.c:133
(gdb)
This is a 32 bit build, and we explicitly invoke opal_init()
in this test. Are we invoking it incorrectly? Or is something legitimately busted in opal_init()
in 32 bit builds?