Skip to content

v2.1.5: OpenMPI error during PETSc test #5932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sagitter opened this issue Oct 17, 2018 · 5 comments
Closed

v2.1.5: OpenMPI error during PETSc test #5932

sagitter opened this issue Oct 17, 2018 · 5 comments

Comments

@sagitter
Copy link

sagitter commented Oct 17, 2018

Background information

System: Fedora 30 (devel branch)
Architectures: always on i686 and armv7hl

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

v2.1.5

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Installed by system distribution package: https://src.fedoraproject.org/rpms/openmpi

Details of the problem

Test of PETSc-3.10.2 is failing on x86 architectures. The error looks attributable to OpenMPI:

+ export
LD_LIBRARY_PATH=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir/i386/lib
+
LD_LIBRARY_PATH=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir/i386/lib
+ export PETSC_DIR=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir
+ PETSC_DIR=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir
+ export PETSC_ARCH=i386
+ PETSC_ARCH=i386
+ export MPI_INTERFACE_HOSTNAME=localhost
+ MPI_INTERFACE_HOSTNAME=localhost
+ export 'PETSCVALGRIND_OPTIONS= --tool=memcheck --leak-check=yes
--track-origins=yes'
+ PETSCVALGRIND_OPTIONS=' --tool=memcheck --leak-check=yes
--track-origins=yes'
+ export 'CFLAGS=-O0 -g -Wl,-z,now -fPIC'
+ CFLAGS='-O0 -g -Wl,-z,now -fPIC'
+ export 'CXXFLAGS=-O0 -g -Wl,-z,now -fPIC'
+ CXXFLAGS='-O0 -g -Wl,-z,now -fPIC'
+ export 'FFLAGS=-O0 -g -Wl,-z,now -fPIC -I/usr/lib/gfortran/modules'
+ FFLAGS='-O0 -g -Wl,-z,now -fPIC -I/usr/lib/gfortran/modules'
+ make -C buildopenmpi_dir test
'MPIEXEC=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir/lib/petsc/bin/petscmpiexec
-valgrind'
make: Entering directory
'/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir'
Running test examples to verify correct installation
Using PETSC_DIR=/builddir/build/BUILD/petsc-3.10.2/buildopenmpi_dir and
PETSC_ARCH=i386
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI
process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
==25868== Conditional jump or move depends on uninitialised value(s)
==25868==    at 0x8E2CCA3: opal_value_unload (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x6088607: ompi_proc_complete_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x608C845: ompi_mpi_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x60B2A97: PMPI_Init_thread (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x49E0B0F: PetscInitialize (pinit.c:875)
==25868==    by 0x8049643: main (ex19.c:106)
==25868==  Uninitialised value was created by a stack allocation
==25868==    at 0x6088593: ompi_proc_complete_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==
lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
Number of SNES iterations = 2
==25868== 10 bytes in 1 blocks are definitely lost in loss record 11 of 189
==25868==    at 0x40356A4: malloc (vg_replace_malloc.c:299)
==25868==    by 0xA890F6F: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0xA88FEFD: pmix_bfrop_unpack_buffer (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0xA8901D0: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0xA89BED4: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0xA899C7C: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0x8E63AFD: opal_libevent2022_event_base_loop (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0xA897C32: ??? (in
/usr/lib/openmpi/lib/openmpi/mca_pmix_pmix112.so)
==25868==    by 0x61635DD: start_thread (in
/usr/lib/libpthread-2.28.9000.so)
==25868==    by 0x626A699: clone (in /usr/lib/libc-2.28.9000.so)
==25868==
==25868== 10 bytes in 1 blocks are definitely lost in loss record 12 of 189
==25868==    at 0x40356A4: malloc (vg_replace_malloc.c:299)
==25868==    by 0x6201519: strdup (in /usr/lib/libc-2.28.9000.so)
==25868==    by 0x8E4B0FE: mca_base_var_enum_create_flag (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E5DDA5: ??? (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E4CC34: mca_base_framework_register (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E4CCE0: mca_base_framework_open (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x60DD028: ??? (in /usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x8E4CD68: mca_base_framework_open (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x608C410: ompi_mpi_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x60B2A97: PMPI_Init_thread (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x49E0B0F: PetscInitialize (pinit.c:875)
==25868==    by 0x8049643: main (ex19.c:106)
==25868==
==25868== 17 bytes in 1 blocks are definitely lost in loss record 79 of 189
==25868==    at 0x40356A4: malloc (vg_replace_malloc.c:299)
==25868==    by 0x6201519: strdup (in /usr/lib/libc-2.28.9000.so)
==25868==    by 0x8E4B0FE: mca_base_var_enum_create_flag (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E5DDC0: ??? (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E4CC34: mca_base_framework_register (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x8E4CCE0: mca_base_framework_open (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x60DD028: ??? (in /usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x8E4CD68: mca_base_framework_open (in
/usr/lib/openmpi/lib/libopen-pal.so.20.10.4)
==25868==    by 0x608C410: ompi_mpi_init (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x60B2A97: PMPI_Init_thread (in
/usr/lib/openmpi/lib/libmpi.so.20.10.3)
==25868==    by 0x49E0B0F: PetscInitialize (pinit.c:875)
==25868==    by 0x8049643: main (ex19.c:106)
...
@opoplawski
Copy link
Contributor

I'm not sure the report makes it clear that the test was stuck there. We've seen a number of hangs with openmpi 2.1.5 on 32-bit, but this one at least appears to be resolved with 2.1.6rc1.

@jsquyres
Copy link
Member

Ok, good. There were definitely some (more) atomics fixes in the upcoming 2.1.6. @hjelmn is checking see if we missed any (per #2526 (comment)).

@jsquyres jsquyres changed the title OpenMPI error during PETSc test v2.1.5: OpenMPI error during PETSc test Nov 30, 2018
@jsquyres jsquyres added this to the v2.1.6 milestone Nov 30, 2018
@jsquyres jsquyres added the bug label Nov 30, 2018
@jsquyres jsquyres modified the milestones: v2.1.6, v2.1.7 Jan 14, 2019
@jsquyres
Copy link
Member

We just released v4.0.2. Has this issue been resolved? Open MPI is a few generations later than when this bug was filed.

@opoplawski
Copy link
Contributor

Yes, PETSc tests are running fine now.

@rhc54
Copy link
Contributor

rhc54 commented Oct 19, 2019

Thanks!

@rhc54 rhc54 closed this as completed Oct 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants