-
Notifications
You must be signed in to change notification settings - Fork 908
Description
Background information
I have found what appears to be a memory leak within OpenMPI. This was seen in a CFD code we use in our research group, which uses the Boost MPI library. As part of its serialization of std::vectors, Boost makes use of MPI_Alloc_mem (balanced by MPI_Free_mem), and this appears to be the cause of a memory leak which eventually results in some of our simulations crashing due to lack of memory.
OpenMPI versions affected
v2.1.2 and v3.0.0 have the problem described below. v1.10.7 does not.
System and OpenMPI installation
Both the above versions were compiled from source
Platform: Ubuntu 16.04
Hardware: Intel 64-bit
Compiler: gcc-5.4.0 (default as on Ubuntu 16.04)
Configure flags:
./configure --prefix=/local/data/public/pmblakely/openmpi-3.0.0-install
Minimal example
The short program below exhibits the problem:
#include <mpi.h>
int main(void)
{
int i=0;
MPI_Init(&i, NULL);
char* result;
for(size_t i=0 ; i < 10000 ; i++)
{
MPI_Alloc_mem(100, MPI_INFO_NULL, &result);
MPI_Free_mem(result);
}
MPI_Finalize();
}
Compilation
/local/data/public/pmblakely/openmpi-3.0.0-install/bin/mpicc
./memory_leak_check_minimal.C -o ./memory_leak_check_minimal-3.0.0 -g
-O0
Testing
valgrind --tool=massif --threshold=0.1 --detailed-freq=1
./memory_leak_check_minimal-3.0.0
Then, ms_print --threshold=0.1 ./massif.out shows:
->62.52% (1,615,768B) 0x59F6D75: opal_free_list_grow_st (in
/local/data/public/pmblakely/openmpi-3.0.0-install/lib/libopen-pal.so.40.0.0)
near the beginning, and
->80.07% (4,312,304B) 0x59F6D75: opal_free_list_grow_st (in
/local/data/public/pmblakely/openmpi-3.0.0-install/lib/libopen-pal.so.40.0.0)
near the end of the test-run (the final mentions of
opal_free_list_grow_st show that the memory is eventually freed, but
probably by MPI_Finalize()).
I have tested the same program against MPICH 3.2.1 and it doesn't have the memory leak.