Skip to content

Random SIGBUS error with xpmem on openmpi4.1.4 #11463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
arunedarath opened this issue Mar 3, 2023 · 20 comments
Open

Random SIGBUS error with xpmem on openmpi4.1.4 #11463

arunedarath opened this issue Mar 3, 2023 · 20 comments

Comments

@arunedarath
Copy link

arunedarath commented Mar 3, 2023

Hi Folks,

I am running the below MPI program,

#include <mpi.h>                         
#include <stdio.h>                                                                   
#include <stdlib.h>                   
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>                                                          
#include <unistd.h>
#include <fcntl.h>                                                                                         
#include <errno.h>                                                                                 
#include <string.h>

int main(int argc, char** argv)
{
    char *data;
    int size, sender_rank, receiver_rank_start, receiver_rank_end, world_rank, world_size;
    int iterations, i, rank, participating_ranks[1024], participating_ranks_size;
    long page_size;

    MPI_Group world_group, participating_ranks_group;
    MPI_Comm participating_ranks_comm;

    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);
    MPI_Comm_group(MPI_COMM_WORLD, &world_group);

    size = strtol(argv[1], NULL, 0);
    sender_rank = strtol(argv[2], NULL, 0);
    receiver_rank_start = strtol(argv[3], NULL, 0);
    receiver_rank_end = strtol(argv[4], NULL, 0);
    iterations = strtol(argv[5], NULL, 0);

    participating_ranks[0] = sender_rank;
    for (i = 1, rank = receiver_rank_start; rank <= receiver_rank_end; rank++, i++) {
        participating_ranks[i] = rank;
    }

    participating_ranks_size = (receiver_rank_end - receiver_rank_start) + 2;

    MPI_Group_incl(world_group, participating_ranks_size, participating_ranks, &participating_ranks_group);
    MPI_Comm_create_group(MPI_COMM_WORLD, participating_ranks_group, 0, &participating_ranks_comm);

    page_size = sysconf(_SC_PAGESIZE);
    posix_memalign((void **)&data, page_size, size);

    if (world_rank == sender_rank || (world_rank >= receiver_rank_start && world_rank <= receiver_rank_end)) {
        for (i = 0; i < iterations; i++) {
            if (world_rank == sender_rank) {
                memset(data, i, size);
            }

            if (world_rank == sender_rank) {
                for (rank = receiver_rank_start; rank <= receiver_rank_end; rank++) {
                    MPI_Send(data, size, MPI_CHAR, rank, 0x1234, MPI_COMM_WORLD);
                }
            } else if (world_rank >= receiver_rank_start && world_rank <= receiver_rank_end) {
                MPI_Recv(data, size, MPI_CHAR, sender_rank, 0x1234, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
            }

            MPI_Barrier(participating_ranks_comm);
        }
    }

    free(data);
    MPI_Group_free(&world_group);
    MPI_Group_free(&participating_ranks_group);
    if (world_rank == sender_rank || (world_rank >= receiver_rank_start && world_rank <= receiver_rank_end)) {
        MPI_Comm_free(&participating_ranks_comm);
    }
    MPI_Finalize();
}

It fails randomly.

[arunchan@Milan039 ob1_xpmem]$ mpirun -np 128 --map-by core --bind-to core ./send_recv_group 8 16 48 55  200000
[Milan039:4034347] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 237
[arunchan@Milan039 ob1_xpmem]$ mpirun -np 128 --map-by core --bind-to core ./send_recv_group 8 16 48 55  200000
[Milan039:4034741] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 237
[arunchan@Milan039 ob1_xpmem]$ mpirun -np 128 --map-by core --bind-to core ./send_recv_group 8 16 48 55  200000
[Milan039:4035138] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 237
[arunchan@Milan039 ob1_xpmem]$ mpirun -np 128 --map-by core --bind-to core ./send_recv_group 8 16 48 55  200000
[Milan039:4035531] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 237
[arunchan@Milan039 ob1_xpmem]$ mpirun -np 128 --map-by core --bind-to core ./send_recv_group 8 16 48 55  200000
[Milan039:4035923] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 237
[arunchan@Milan039 ob1_xpmem]$ mpirun -np 128 --map-by core --bind-to core ./send_recv_group 8 16 48 55  200000
[Milan039:4036315] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 237
[arunchan@Milan039 ob1_xpmem]$ mpirun -np 128 --map-by core --bind-to core ./send_recv_group 8 16 48 55  200000
[Milan039:4036708] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 237
[arunchan@Milan039 ob1_xpmem]$ mpirun -np 128 --map-by core --bind-to core ./send_recv_group 8 16 48 55  200000
[Milan039:4037115] *** Process received signal ***
[Milan039:4037115] Signal: Bus error (7)
[Milan039:4037115] Signal code: Non-existant physical address (2)
[Milan039:4037115] Failing at address: 0x15550a3267c4
[Milan039:4037115] [ 0] /lib64/libpthread.so.0(+0x12ce0)[0x155554dcece0]
[Milan039:4037115] [ 1] /home/arunchan/openmpi_work/install/ompi_4_1_4_xpmem/lib/openmpi/mca_btl_vader.so(+0x5a58)[0x155548b5ba58]
[Milan039:4037115] [ 2] /home/arunchan/openmpi_work/install/ompi_4_1_4_xpmem/lib/libopen-pal.so.40(opal_progress+0x33)[0x1555544a62c3]
[Milan039:4037115] [ 3] /home/arunchan/openmpi_work/install/ompi_4_1_4_xpmem/lib/libmpi.so.40(ompi_mpi_finalize+0x1a5)[0x1555550307c5]
[Milan039:4037115] [ 4] ./send_recv_group[0x400ecd]
[Milan039:4037115] [ 5] /lib64/libc.so.6(__libc_start_main+0xf3)[0x155554a31cf3]
[Milan039:4037115] [ 6] ./send_recv_group[0x400b1e]
[Milan039:4037115] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 16 with PID 0 on node Milan039 exited on signal 7 (Bus error).
--------------------------------------------------------------------------
[Milan039:4037100] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 237
[Milan039:4037100] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 246
[arunchan@Milan039 ob1_xpmem]$ 

The same program runs perfectly fine if I compile openmpi without xpmem.

How can I solve this problem? [I want the xpmem support to test performance of ob1]

ompi_info and the topology is attached.

--Arun
topology_ompi_info.txt

@arunedarath
Copy link
Author

Hi,

Please let me know If I miss any info, I can collect those.

--Arun

@jsquyres
Copy link
Member

jsquyres commented Mar 6, 2023

Can you please supply all the information that was asked for in the github issue template for bug reporting? Thanks.

@jsquyres jsquyres added this to the v4.1.6 milestone Mar 6, 2023
@jsquyres
Copy link
Member

jsquyres commented Mar 6, 2023

Also, I kinda doubt that it will change anything, but we did just release Open MPI v4.1.5. Can you test with that version just to be complete?

@devreal
Copy link
Contributor

devreal commented Mar 6, 2023

@arunedarath I built Open MPI 4.1.4

../configure '--without-ucx' '--without-hcoll' '--without-verbs' '--enable-mca-no-build=btl-uct' '--enable-mpi1-compatibility' --disable-man-pages --with-xpmem=$HOME/opt-hawk/xpmem --disable-debug

and ran 100 iterations like this:

mpirun -n 128 --mca pml ob1 --mca btl ^openib ./test_send_recv_group 8 16 48 55  200000

I didn't see the error you are seeing.

From your ompi_info output it looks like your XPMEM version is 2.3.0 (/home/software/xpmem/2.3.0). It looks like version 2.6.3 was released in 2015. Could you try deploying the latest release of xpmem? (https://github.com/hpc/xpmem/tags)

@devreal
Copy link
Contributor

devreal commented Mar 6, 2023

Correction: if you can, try the latest commit in the main xpmem repo (which is what I am using).

@arunedarath
Copy link
Author

Thanks for the valuable comments,

I will try with the latest xpmem.. As this requires "root" permission (installing the xpmem.ko), I must ask my system admin.

Please give me 1-2 days, I will be back with all the required info.

--Arun

@arunedarath
Copy link
Author

I checked my kernel logs, it says 2.6.5. The admin might have mistakenly named it "/home/software/xpmem/2.3.0".

arunchan@Milan004 ~]$ dmesg |grep -i xpmem
[ 2.478120] XPMEM kernel module v2.6.5 loaded

@gkatev
Copy link
Contributor

gkatev commented Mar 7, 2023

The crash seems to be in MPI_Finalize. This reminds me of #9868, which was fixed in main and 5.0.x. I'm not sure if v4 was/is also succeptible to the same problem.

@arunedarath
Copy link
Author

arunedarath commented Mar 7, 2023

I tried with openmpi 4.1.5, I get the same error. I will update it according to the "bug report template"

Background information

I want to run openmpi (version 4.1.4) with xpmem using pml/ob1 and btl/vader

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

4.1.5

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

$ wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.5.tar.gz
$ tar -xf openmpi-4.1.5.tar.gz
$ cd openmpi-4.1.5
$ $ ./configure --prefix=/home/arunchan/openmpi_work/install/ompi_4_1_5_xpmem --enable-mpi1-compatibility --enable-mca-no-build=btl-uct --with-hwloc=/home/software/hwloc/2.3.0 --with-pmix --with-slurm --with-pmi=/cm/shared/apps/slurm/20.02.6 C
C=gcc CXX=g++ FC=gfortran --without-verbs --without-ucx --without-hcoll --with-xpmem=/home/software/xpmem/2.3.0
$make -j install

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

NA

Please describe the system on which you are running

  • Operating system/version: CentOS Linux release 8.3.2011
  • Computer hardware:
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 128
    On-line CPU(s) list: 0-127
    Thread(s) per core: 1
    Core(s) per socket: 64
    Socket(s): 2
    NUMA node(s): 8
    Vendor ID: AuthenticAMD
    CPU family: 25
    Model: 1
    Model name: AMD EPYC 7763 64-Core Processor
    Stepping: 1
    CPU MHz: 2450.000
    CPU max MHz: 3529.0520
    CPU min MHz: 1500.0000
    BogoMIPS: 4900.01
    Virtualization: AMD-V
    L1d cache: 32K
    L1i cache: 32K
    L2 cache: 512K
    L3 cache: 32768K
    NUMA node0 CPU(s): 0-15
    NUMA node1 CPU(s): 16-31
    NUMA node2 CPU(s): 32-47
    NUMA node3 CPU(s): 48-63
    NUMA node4 CPU(s): 64-79
    NUMA node5 CPU(s): 80-95
    NUMA node6 CPU(s): 96-111
    NUMA node7 CPU(s): 112-127
    Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca sme sev sev_es
  • Network type:
    NA

Details of the problem

The same error I get for 4.1.5 also

$ echo $PATH                                               
/home/arunchan/openmpi_work/install/ompi_4_1_5_xpmem/bin:/home/software/gcc/12.1.0/bin:/home/arunchan/.local/bin:/home/arunchan/bin:/cm/shared/apps/slurm/current/sbin:/cm/shared/apps/slurm/current/bin:/cm/local/apps/environment-modules/4.5.3//bin:/usr/local/bin:/usr/bin:/us
r/local/sbin:/usr/sbin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/4.5.3/bin     
[test_log.txt](https://github.com/open-mpi/ompi/files/10908477/test_log.txt)
                                                
[arunchan@Milan001 ob1_xpmem]$ mpirun -np 128 --map-by core --bind-to core ./send_recv_group 8 16 48 55  200000
[Milan001:1852206] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 237               
[arunchan@Milan001 ob1_xpmem]$ mpirun -np 128 --map-by core --bind-to core ./send_recv_group 8 16 48 55  200000
[Milan001:1852601] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 237               
[arunchan@Milan001 ob1_xpmem]$ mpirun -np 128 --map-by core --bind-to core ./send_recv_group 8 16 48 55  200000
[Milan001:1853004] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 237               
[arunchan@Milan001 ob1_xpmem]$ mpirun -np 128 --map-by core --bind-to core ./send_recv_group 8 16 48 55  200000
[Milan001:1853423] *** Process received signal ***                                            
[Milan001:1853423] Signal: Bus error (7)                                                      
[Milan001:1853423] Signal code: Non-existant physical address (2)                             
[Milan001:1853423] Failing at address: 0x15550a3277c4                                         
[Milan001:1853423] [ 0] /lib64/libpthread.so.0(+0x12ce0)[0x155554dcece0]                      
[Milan001:1853423] [ 1] /home/arunchan/openmpi_work/install/ompi_4_1_5_xpmem/lib/openmpi/mca_btl_vader.so(+0x5a58)[0x155548b5ba58]
[Milan001:1853423] [ 2] /home/arunchan/openmpi_work/install/ompi_4_1_5_xpmem/lib/libopen-pal.so.40(opal_progress+0x33)[0x1555544a52c3]
[Milan001:1853423] [ 3] /home/arunchan/openmpi_work/install/ompi_4_1_5_xpmem/lib/libmpi.so.40(ompi_mpi_finalize+0x1a5)[0x1555550307c5]
[Milan001:1853423] [ 4] ./send_recv_group[0x400ecd]                                     
[Milan001:1853423] [ 5] /lib64/libc.so.6(__libc_start_main+0xf3)[0x155554a31cf3]              
[Milan001:1853423] [ 6] ./send_recv_group[0x400b1e]                                           
[Milan001:1853423] *** End of error message ***                                               
--------------------------------------------------------------------------                    
Primary job  terminated normally, but 1 process returned                                      
a non-zero exit code. Per user-direction, the job has been aborted.                           
--------------------------------------------------------------------------                    
--------------------------------------------------------------------------                    
mpirun noticed that process rank 16 with PID 0 on node Milan001 exited on signal 7 (Bus error).
--------------------------------------------------------------------------                    
[Milan001:1853399] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 237               
[Milan001:1853399] PMIX ERROR: NO-PERMISSIONS in file dstore_base.c at line 246

--Arun

@arunedarath
Copy link
Author

arunedarath commented Mar 9, 2023

Hi,

I tried openmpi_5_rc10 and did not see the issue (Ran 2000 iterations without sigbus error).

for i in seq 2000; do mpirun -np 128 --map-by core --bind-to core ./send_recv_group 8 16 48 55 200000; done

Version:
commit dffd17f73d1609841533e7a6e4dd6d28fddbe78c (HEAD -> my_v5.0.0rc10, tag: v5.0.0rc10)
Merge: 248549a008 32aab11922
Author: Austen Lauria <[email protected]>
Date:   Thu Feb 2 16:22:26 2023 -0500

    Merge pull request #11377 from awlauria/fix_tarball_builderv50x
    
    v5.0.x:  contrib/dist/make_dist_tarball: Change permissions back to 755.

There is a slight difference in the way I configured ompi5. I used the below command

./configure --prefix=/home/arunchan/openmpi_work/install/ompi_5_rc10_xpmem --enable-mpi1-compatibility --enable-mca-no-build=btl-uct  --with-pmix --with-slurm  CC=gcc CXX=g++ FC=gfortran --without-verbs --without-ucx --without-hcoll --with-xpmem=/home/software/xpmem/2.3.0

For ompi4.1.x I used the below version,

./configure --prefix=/home/arunchan/openmpi_work/install/ompi_4_1_4_xpmem --enable-mpi1-compatibility --enable-mca-no-build=btl-uct --with-hwloc=/home/software/hwloc/2.3.0 --with-pmix --with-slurm --with-pmi=/cm/shared/apps/slurm/20.02.6 CC=gcc CXX=g++ FC=gfortran --without-verbs --without-ucx --without-hcoll --with-xpmem=/home/software/xpmem/2.3.0

That means my xpmem is perfectly fine and, the problem is with openmpi4.1.x version, isn't it?

--Arun

@arunedarath
Copy link
Author

arunedarath commented Mar 10, 2023

I did a bisect between v5.0.0rc1 (good) and v4.1.5 (bad), and got the below result.

$ git bisect log
git bisect start
\# good: [032f745c4dae77f32f5a1904d774e8a4e9fa4176] Merge pull request #9431 from hkuno/pr/hkuno_license_v5.0.x
git bisect good 032f745c4dae77f32f5a1904d774e8a4e9fa4176
\# bad: [42b829b3b3190dd1987d113fd8c2810eb8584007] Merge pull request #11426 from bwbarrett/v4.1.x-release
git bisect bad 42b829b3b3190dd1987d113fd8c2810eb8584007
\# good: [b6b9552ca999d68637ce522748fa0ab8380e6954] Merge pull request #5444 from gbossu/fix-file-delete
git bisect good b6b9552ca999d68637ce522748fa0ab8380e6954
\# good: [b6b9552ca999d68637ce522748fa0ab8380e6954] Merge pull request #5444 from gbossu/fix-file-delete
git bisect good b6b9552ca999d68637ce522748fa0ab8380e6954
\# skip: [1fb0d23caa76b2f391d02431596c05304eda6b0d] Merge pull request #7696 from gpaulsen/topic/v4.0.x/v4.0.4_VERSION
git bisect skip 1fb0d23caa76b2f391d02431596c05304eda6b0d
\# good: [dceea5ad87d9a1d131ab295b68d9d400d30e5a7e] coll/tuned: Revert RSB and RS default algorithms
git bisect good dceea5ad87d9a1d131ab295b68d9d400d30e5a7e
\# good: [dceea5ad87d9a1d131ab295b68d9d400d30e5a7e] coll/tuned: Revert RSB and RS default algorithms
git bisect good dceea5ad87d9a1d131ab295b68d9d400d30e5a7e
\# bad: [0aaf11418e4e6d6d0b4c07013f35f79759fa46e3] osc/pt2pt: Accumulation lock, not module lock when appending
git bisect bad 0aaf11418e4e6d6d0b4c07013f35f79759fa46e3
\# bad: [2974b60412aa31e238b7f577091cc0e38fbb266c] Merge pull request #8423 from rhc54/cmr41/slm
git bisect bad 2974b60412aa31e238b7f577091cc0e38fbb266c
\# bad: [9f228c9dab7f92e1cdc20baafbf30d0040c1b1f8] coll/base: Fix collective module selection preference treatment
git bisect bad 9f228c9dab7f92e1cdc20baafbf30d0040c1b1f8
\# good: [087a67245d7c639f9cab939914536eac4659a60d] Merge pull request #7945 from bosilca/4.1/han
git bisect good 087a67245d7c639f9cab939914536eac4659a60d
\# bad: [bed064f198222570a02ccb33fd5ee60c85264ad7] Merge pull request #8181 from ggouaillardet/topic/v4.1.x/avx512_pgi
git bisect bad bed064f198222570a02ccb33fd5ee60c85264ad7
\# good: [8c1d8305b429edb82083d9203e1731966ed43115] Merge pull request #8148 from jsquyres/pr/v4.1.x/reproducible-build
git bisect good 8c1d8305b429edb82083d9203e1731966ed43115
\# bad: [6bb3ef4d1cb72942dee963656b4c269615ef577b] Merge pull request #8172 from jsquyres/pr/v4.1.0/NEWS-updates
git bisect bad 6bb3ef4d1cb72942dee963656b4c269615ef577b
\# bad: [40e104d087099c5d9e8f49d1ed7b9e26ca4e6341] Merge pull request #8123 from jjhursey/v4.1-pmix-v3.2
git bisect bad 40e104d087099c5d9e8f49d1ed7b9e26ca4e6341
\# bad: [60ee1332557ef2da254b17834839b9cd550efc62] Disable man pages for internal OpenPMIx
git bisect bad 60ee1332557ef2da254b17834839b9cd550efc62
\# bad: [be86f87b9279b16850df83c47e72475dbe880a58] Update Internal PMIx to OpenPMIx v3.2.1rc1
git bisect bad be86f87b9279b16850df83c47e72475dbe880a58
\# first bad commit: [be86f87b9279b16850df83c47e72475dbe880a58] Update Internal PMIx to OpenPMIx v3.2.1rc1 ---> 
``

It says this is the commit that is causing this issue. Don't know how to decode this info. Can someone help to understand it?

--Arun

@gkatev
Copy link
Contributor

gkatev commented Mar 13, 2023

Hi @arunedarath, since you are doing tests, could you try testing 5.0.0rc10, but with commit 7d3f868 reverted? You can apply the patch below, which reverts this commit and resolves the resulting conflict. It's untested though -- please let me know if it doesn't work.

Patch
diff --git a/opal/mca/btl/sm/btl_sm_component.c b/opal/mca/btl/sm/btl_sm_component.c
index 9d73e1e39f..f6fccf2a61 100644
--- a/opal/mca/btl/sm/btl_sm_component.c
+++ b/opal/mca/btl/sm/btl_sm_component.c
@@ -253,6 +253,11 @@ static int mca_btl_sm_component_close(void)
     OBJ_DESTRUCT(&mca_btl_sm_component.pending_endpoints);
     OBJ_DESTRUCT(&mca_btl_sm_component.pending_fragments);
 
+    if (mca_smsc_base_has_feature(MCA_SMSC_FEATURE_CAN_MAP)
+        && NULL != mca_btl_sm_component.my_segment) {
+        munmap(mca_btl_sm_component.my_segment, mca_btl_sm_component.segment_size);
+    }
+
     mca_btl_sm_component.my_segment = NULL;
 
     if (mca_btl_sm_component.mpool) {
@@ -270,9 +275,14 @@ static int mca_btl_base_sm_modex_send(void)
 
     modex_size = sizeof(modex) - sizeof(modex.seg_ds);
 
+    if (!mca_smsc_base_has_feature(MCA_SMSC_FEATURE_CAN_MAP)) {
         modex.seg_ds_size = opal_shmem_sizeof_shmem_ds(&mca_btl_sm_component.seg_ds);
         memmove(&modex.seg_ds, &mca_btl_sm_component.seg_ds, modex.seg_ds_size);
         modex_size += modex.seg_ds_size;
+    } else {
+        modex.segment_base = (uintptr_t) mca_btl_sm_component.my_segment;
+        modex.seg_ds_size = 0;
+    }
 
     int rc;
     OPAL_MODEX_SEND(rc, PMIX_LOCAL, &mca_btl_sm_component.super.btl_version, &modex, modex_size);
@@ -365,31 +375,43 @@ mca_btl_sm_component_init(int *num_btls, bool enable_progress_threads, bool enab
         mca_btl_sm.super.btl_put = NULL;
     }
 
-    char *sm_file;
-
-    // Note: Use the node_rank not the local_rank for the backing file.
-    // This makes the file unique even when recovering from failures.
-    rc = opal_asprintf(&sm_file, "%s" OPAL_PATH_SEP "sm_segment.%s.%u.%x.%d",
-                       mca_btl_sm_component.backing_directory, opal_process_info.nodename,
-                       geteuid(), OPAL_PROC_MY_NAME.jobid, opal_process_info.my_node_rank);
-    if (0 > rc) {
-        free(btls);
-        return NULL;
-    }
-    opal_pmix_register_cleanup(sm_file, false, false, false);
-
-    rc = opal_shmem_segment_create(&component->seg_ds, sm_file, component->segment_size);
-    free(sm_file);
-    if (OPAL_SUCCESS != rc) {
-        BTL_VERBOSE(("Could not create shared memory segment"));
-        free(btls);
-        return NULL;
-    }
+    if (!mca_smsc_base_has_feature(MCA_SMSC_FEATURE_CAN_MAP)) {
+        char *sm_file;
+
+        // Note: Use the node_rank not the local_rank for the backing file.
+        // This makes the file unique even when recovering from failures.
+        rc = opal_asprintf(&sm_file, "%s" OPAL_PATH_SEP "sm_segment.%s.%u.%x.%d",
+                           mca_btl_sm_component.backing_directory, opal_process_info.nodename,
+                           geteuid(), OPAL_PROC_MY_NAME.jobid, opal_process_info.my_node_rank);
+        if (0 > rc) {
+            free(btls);
+            return NULL;
+        }
+        opal_pmix_register_cleanup(sm_file, false, false, false);
+
+        rc = opal_shmem_segment_create(&component->seg_ds, sm_file, component->segment_size);
+        free(sm_file);
+        if (OPAL_SUCCESS != rc) {
+            BTL_VERBOSE(("Could not create shared memory segment"));
+            free(btls);
+            return NULL;
+        }
 
-    component->my_segment = opal_shmem_segment_attach(&component->seg_ds);
-    if (NULL == component->my_segment) {
-        BTL_VERBOSE(("Could not attach to just created shared memory segment"));
-        goto failed;
+        component->my_segment = opal_shmem_segment_attach(&component->seg_ds);
+        if (NULL == component->my_segment) {
+            BTL_VERBOSE(("Could not attach to just created shared memory segment"));
+            goto failed;
+        }
+    } else {
+        /* if the shared-memory single-copy component can map memory (XPMEM) an anonymous segment
+         * can be used instead */
+        component->my_segment = mmap(NULL, component->segment_size, PROT_READ | PROT_WRITE,
+                                     MAP_ANONYMOUS | MAP_SHARED, -1, 0);
+        if ((void *) -1 == component->my_segment) {
+            BTL_VERBOSE(("Could not create anonymous memory segment"));
+            free(btls);
+            return NULL;
+        }
     }
 
     /* initialize my fifo */
@@ -411,7 +433,11 @@ mca_btl_sm_component_init(int *num_btls, bool enable_progress_threads, bool enab
 
     return btls;
 failed:
-    opal_shmem_unlink(&component->seg_ds);
+    if (mca_smsc_base_has_feature(MCA_SMSC_FEATURE_CAN_MAP)) {
+        munmap(component->my_segment, component->segment_size);
+    } else {
+        opal_shmem_unlink(&component->seg_ds);
+    }
 
     if (btls) {
         free(btls);
diff --git a/opal/mca/btl/sm/btl_sm_module.c b/opal/mca/btl/sm/btl_sm_module.c
index 7835742e4f..2ac02884f7 100644
--- a/opal/mca/btl/sm/btl_sm_module.c
+++ b/opal/mca/btl/sm/btl_sm_module.c
@@ -184,6 +184,12 @@ static int init_sm_endpoint(struct mca_btl_base_endpoint_t **ep_out, struct opal
             mca_btl_sm.super.btl_put = NULL;
             mca_btl_sm.super.btl_flags &= ~MCA_BTL_FLAGS_RDMA;
         }
+        if (mca_smsc_base_has_feature(MCA_SMSC_FEATURE_CAN_MAP)) {
+            ep->smsc_map_context = MCA_SMSC_CALL(map_peer_region, ep->smsc_endpoint, /*flag=*/0,
+                                                 (void *) (uintptr_t) modex->segment_base,
+                                                 mca_btl_sm_component.segment_size,
+                                                 (void **) &ep->segment_base);
+        } else {
             /* store a copy of the segment information for detach */
             ep->seg_ds = malloc(modex->seg_ds_size);
             if (NULL == ep->seg_ds) {
@@ -196,6 +202,7 @@ static int init_sm_endpoint(struct mca_btl_base_endpoint_t **ep_out, struct opal
             if (NULL == ep->segment_base) {
                 return OPAL_ERROR;
             }
+        }
 
         OBJ_CONSTRUCT(&ep->lock, opal_mutex_t);
 
@@ -345,8 +352,10 @@ static int sm_finalize(struct mca_btl_base_module_t *btl)
     free(component->fbox_in_endpoints);
     component->fbox_in_endpoints = NULL;
 
-    opal_shmem_unlink(&mca_btl_sm_component.seg_ds);
-    opal_shmem_segment_detach(&mca_btl_sm_component.seg_ds);
+    if (!mca_smsc_base_has_feature(MCA_SMSC_FEATURE_CAN_MAP)) {
+        opal_shmem_unlink(&mca_btl_sm_component.seg_ds);
+        opal_shmem_segment_detach(&mca_btl_sm_component.seg_ds);
+    }
 
     return OPAL_SUCCESS;
 }
@@ -511,18 +520,22 @@ static void mca_btl_sm_endpoint_destructor(mca_btl_sm_endpoint_t *ep)
     OBJ_DESTRUCT(&ep->pending_frags);
     OBJ_DESTRUCT(&ep->pending_frags_lock);
 
-    if (ep->seg_ds) {
-        opal_shmem_ds_t seg_ds;
-
-        /* opal_shmem_segment_detach expects a opal_shmem_ds_t and will
-         * stomp past the end of the seg_ds if it is too small (which
-         * ep->seg_ds probably is) */
-        memcpy(&seg_ds, ep->seg_ds, opal_shmem_sizeof_shmem_ds(ep->seg_ds));
-        free(ep->seg_ds);
-        ep->seg_ds = NULL;
-
-        /* disconnect from the peer's segment */
-        opal_shmem_segment_detach(&seg_ds);
+    if (!mca_smsc_base_has_feature(MCA_SMSC_FEATURE_CAN_MAP)) {
+        if (ep->seg_ds) {
+            opal_shmem_ds_t seg_ds;
+
+            /* opal_shmem_segment_detach expects a opal_shmem_ds_t and will
+             * stomp past the end of the seg_ds if it is too small (which
+             * ep->seg_ds probably is) */
+            memcpy(&seg_ds, ep->seg_ds, opal_shmem_sizeof_shmem_ds(ep->seg_ds));
+            free(ep->seg_ds);
+            ep->seg_ds = NULL;
+
+            /* disconnect from the peer's segment */
+            opal_shmem_segment_detach(&seg_ds);
+        }
+    } else if (NULL != ep->smsc_map_context) {
+        MCA_SMSC_CALL(unmap_peer_region, ep->smsc_map_context);
     }
 
     if (ep->fbox_out.fbox) {

@arunedarath
Copy link
Author

Hi George,

I checked out v5.0.0rc10, applied the patch, run the application 2000 times, and did not see the issue.

The suggested commit (7d3f868) came in Feb-3-2022, but for me, 'v5.0.0rc1' [2021-sep] is working perfectly fine. That means these two issues are unrelated right? The bisect log above shows this.

I have a fundamental question regarding compiling openmpi5 from the git repo. I followed the below steps to compile v5.0.0rc10.

a) git checkout v5.0.0rc10
b) ./autogen.pl
c) ./configure ...
d) make -j install

Is checking out the tag v5.0.0rc10 in the main repo enough to select the corresponding commits from the submodules as well?

[arunchan@Milan014 ompi]$ git branch

  • (HEAD detached at v5.0.0rc10)
    main
    [arunchan@Milan014 ompi]$ git submodule status
    852d284788be7fd9d543af90ae98693b2b1c4393 3rd-party/openpmix (v4.2.3rc2-6-g852d2847)
    facdb3555c4b0ac124ef3a40feadb78946932bb6 3rd-party/prrte (v3.0.1rc1-25-gfacdb3555c)
    237ceff1a8ed996d855d69f372be9aaea44919ea config/oac (237ceff)
    [arunchan@Milan014 ompi]$ git log --oneline 3rd-party/openpmix | head -1
    e26da2c v5.0: Update PMIx/PRRTE pointers to latest rc
    [arunchan@Milan014 ompi]$ git log --oneline 3rd-party/prrte | head -1
    e26da2c v5.0: Update PMIx/PRRTE pointers to latest rc
    [arunchan@Milan014 ompi]$ git log --oneline config/oac | head -1
    987100d Port opal_setup_sphinx.m4 to oac_setup_sphinx.m4

Are the steps right? or do I need to set manually the HEAD commits in submodules also?
--Arun

@gkatev
Copy link
Contributor

gkatev commented Mar 13, 2023

I see, thanks. So it initially doesn't seem to be the same issue as the one I linked earlier.

The first bad commit does seem to be unrelated to btl/vader. But it's possible that one contibuting factor is the exit/finalize pattern of processes (which was also a factor in 7d3f868, if I understood the description correctly). Now for this reason we can't assume for sure that this is not the same issue as the one linked either, as rc10 might not contain the extra contributing factor (the finalize pattern).

But let's take it from the top. Can you do a debug build with --enable-debug (with 4.1.5 or 4.1.4), and post the backtrace from gdb? Use --mca opal_signal "" to have a core dump be created.

AFAIK you also need to do git submodule update --recursive, but these hashes seem correct for 5.0.0rc10.

@arunedarath
Copy link
Author

Running it with openmpi4.1.5 configured with --enable-debug does not seem to repro the issue. I am putting it for a longer run, let's see..

--Arun

@arunedarath
Copy link
Author

No luck. It ran for 4 hours without any issues with openmpi4.1.5 configured with --enable-debug.

@arunedarath
Copy link
Author

Hi,

ompi 4.1.5 configured with "--enable-debug" is not showing this failure. How do you think we should proceed further?

Are there other compile options/debug logs(Debug flags) to help find this issue's root cause?

The decode of core without "--enable-debug" is below.

[arunchan@Milan015 ob1_xpmem]$ gdb send_recv_group core.send_recv_group.1350.0363f02815784d5bbb22a0a51db0e3f3.575823.1679237339000000
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-16.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from send_recv_group...(no debugging symbols found)...done.

warning: Can't open file (null) during file-backed mapping note processing

warning: Can't open file (null) during file-backed mapping note processing

warning: Can't open file (null) during file-backed mapping note processing

warning: Can't open file (null) during file-backed mapping note processing

warning: Can't open file (null) during file-backed mapping note processing

warning: Can't open file (null) during file-backed mapping note processing

warning: Can't open file (null) during file-backed mapping note processing
[New LWP 575823]
[New LWP 575874]
[New LWP 575915]

warning: Section `.reg-xstate/575823' in core file too small.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./send_recv_group 8 16 48 55 200000'.
Program terminated with signal SIGBUS, Bus error.

warning: Section `.reg-xstate/575823' in core file too small.
#0 0x0000155549170a28 in mca_btl_vader_component_progress ()
from /home/arunchan/openmpi_work/install/ompi_xpmem_sigbus/lib/openmpi/mca_btl_vader.so
[Current thread is 1 (Thread 0x155555536740 (LWP 575823))]
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-164.el8_5.3.x86_64 libnl3-3.5.0-1.el8.x86_64 libpciaccess-0.14-1.el8.x86_64 zlib-1.2.11-16.2.el8_3.x86_64
(gdb) bt
#0 0x0000155549170a28 in mca_btl_vader_component_progress ()
from /home/arunchan/openmpi_work/install/ompi_xpmem_sigbus/lib/openmpi/mca_btl_vader.so
#1 0x0000155554467ac3 in opal_progress () from /home/arunchan/openmpi_work/install/ompi_xpmem_sigbus/lib/libopen-pal.so.40
#2 0x0000155555031815 in ompi_mpi_finalize () from /home/arunchan/openmpi_work/install/ompi_xpmem_sigbus//lib/libmpi.so.40
#3 0x0000000000400ebd in main ()

--Arun

@arunedarath
Copy link
Author

arunedarath commented Mar 22, 2023

Correction: if you can, try the latest commit in the main xpmem repo (which is what I am using).

@devreal Finally, xpmem in the lab is updated to the latest from https://github.com/hpc/xpmem. But there is no difference the same SIGBUS error (crash) is seen with that as well.

--Arun

@arunedarath
Copy link
Author

Hi All,

Ping,..

Do you have any ideas to debug this further? (Really annoying to see the SIGBUS error coming randomly while using xpmem :( )

--Arun

@gkatev
Copy link
Contributor

gkatev commented Mar 31, 2023

FYI @hppritcha, might this be the same underlying issue as in #9868?

@bwbarrett bwbarrett modified the milestones: v4.1.6, v4.1.7 Sep 30, 2023
@jsquyres jsquyres modified the milestones: v4.1.7, v4.1.8 Jan 23, 2025
@bwbarrett bwbarrett modified the milestones: v4.1.8, v4.1.9 Feb 5, 2025
@bwbarrett bwbarrett modified the milestones: v4.1.9, v4.1.10 Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants