Skip to content

PPC64 crash with current master branch #1363

@nysal

Description

@nysal

I started noticing occasional crashes on PPC64 with master branch. The crash is from the child process soon after mpirun forks it (before it exec's the application). The stack trace was nearly the same most of the time:

#0  0x00003fffb791031c in _int_malloc () from /lib64/power8/libc.so.6
#1  0x00003fffb79134dc in malloc () from /lib64/power8/libc.so.6
#2  0x00003fffb79030ac in vasprintf@@GLIBC_2.17 () from /lib64/power8/libc.so.6
#3  0x00003fffb78e06e4 in asprintf@@GLIBC_2.17 () from /lib64/power8/libc.so.6
#4  0x00003fffb7ef058c in orte_iof_base_setup_child (opts=0x3fffffffd578, env=0x3fffffffd560)
    at /u/src/ompi/orte/mca/iof/base/iof_base_setup.c:186
#5  0x00003fffb5c42cd0 in do_child (context=0x10288060, child=0x10287f00, environ_copy=0x10042600, jobdat=0x10047b00, 
    write_fd=39, opts=...) at /u/src/orte/mca/odls/default/odls_default_module.c:419
#6  0x00003fffb5c43b64 in odls_default_fork_local_proc (context=0x10288060, child=0x10287f00, environ_copy=0x10042600, 
    jobdat=0x10047b00) at /u/src/orte/mca/odls/default/odls_default_module.c:726
#7  0x00003fffb7ef8fcc in orte_odls_base_default_launch_local (fd=-1, sd=4, cbdata=0x10287cb0)
    at /u/src/orte/mca/odls/base/odls_base_default_fns.c:1038
#8  0x00003fffb7da15ec in event_process_active_single_queue (base=0x1008d5d0, activeq=0x1008db50)
    at /u/src/opal/mca/event/libevent2022/libevent/event.c:1370
#9  0x00003fffb7da197c in event_process_active (base=0x1008d5d0)
    at /u/src/opal/mca/event/libevent2022/libevent/event.c:1440
#10 0x00003fffb7da23cc in opal_libevent2022_event_base_loop (base=0x1008d5d0, flags=1)
    at /u/src/opal/mca/event/libevent2022/libevent/event.c:1644
#11 0x0000000010005eb8 in orterun (argc=7, argv=0x3ffffffff068)
    at /u/src/orte/tools/orterun/orterun.c:1077
#12 0x0000000010003680 in main (argc=7, argv=0x3ffffffff068)
    at /u/src/orte/tools/orterun/main.c:13

The stack trace was at times slightly different (crash in strncmp instead of malloc). The issue was that the copy of the environment variables stored in the application context (orte_app_context_t) were pointing to invalid memory. Looking at /proc/pid/smaps indeed the address do not correspond to any of the VMAs. However looking at the parent's proc entries, the address corresponds to a VMA that is marked MADV_DONTFORK. The UD oob component calls ibv_fork_init() and this causes memory registered with ibv_reg_mr to be marked MADV_DONTFORK. The specific allocation that was causing the problem along with a proposed fix is given below:

--- a/orte/mca/oob/ud/oob_ud_component.c
+++ b/orte/mca/oob/ud/oob_ud_component.c
@@ -678,15 +678,18 @@ static inline int mca_oob_ud_port_recv_start (mca_oob_ud_port_t *port)
 static inline int mca_oob_ud_alloc_reg_mem (struct ibv_pd *pd, mca_oob_ud_reg_mem_t *reg_mem,
                                             const int buffer_len)
 {
+    size_t buffer_len_aligned, page_size;
     reg_mem->len = buffer_len;
     reg_mem->ptr = NULL;
     reg_mem->mr  = NULL;
-
+    page_size = sysconf(_SC_PAGESIZE);
+    // Buffer size should be a multiple of page size
+    buffer_len_aligned = (buffer_len + page_size - 1) & ~(page_size - 1);
     opal_output_verbose(5, orte_oob_base_framework.framework_output,
                           "%s oob:ud:alloc_reg_mem allocing and registering %d bytes of memory with pd %p",
                           ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), buffer_len, (void *) pd);

-    posix_memalign ((void **)&reg_mem->ptr, sysconf(_SC_PAGESIZE), buffer_len);
+    posix_memalign ((void **)&reg_mem->ptr, page_size, buffer_len_aligned);
     if (NULL == reg_mem->ptr) {
         ORTE_ERROR_LOG(ORTE_ERR_OUT_OF_RESOURCE);
         return ORTE_ERR_OUT_OF_RESOURCE;

The original code used to allocate an arbitrary buffer size, however ibv_reg_mr will mark the entire page MADV_DONTFORK. The heap allocation routines would end up allocating the remaining portion of the page and apparently this was used to store a copy of environment variables passed to the child process. Let me know if the fix looks ok and I'll push it to master. I need to check if this has to go into v2.x as well.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions