Skip to content

Memory leak with MPI_Alloc_mem #4567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pmblakely opened this issue Dec 5, 2017 · 7 comments
Closed

Memory leak with MPI_Alloc_mem #4567

pmblakely opened this issue Dec 5, 2017 · 7 comments

Comments

@pmblakely
Copy link

Background information

I have found what appears to be a memory leak within OpenMPI. This was seen in a CFD code we use in our research group, which uses the Boost MPI library. As part of its serialization of std::vectors, Boost makes use of MPI_Alloc_mem (balanced by MPI_Free_mem), and this appears to be the cause of a memory leak which eventually results in some of our simulations crashing due to lack of memory.

OpenMPI versions affected

v2.1.2 and v3.0.0 have the problem described below. v1.10.7 does not.

System and OpenMPI installation

Both the above versions were compiled from source
Platform: Ubuntu 16.04
Hardware: Intel 64-bit
Compiler: gcc-5.4.0 (default as on Ubuntu 16.04)
Configure flags:
./configure --prefix=/local/data/public/pmblakely/openmpi-3.0.0-install

Minimal example

The short program below exhibits the problem:

#include <mpi.h>

int main(void)
{
  int i=0;
  MPI_Init(&i, NULL);

  char* result;
  for(size_t i=0 ; i < 10000 ; i++)
  {
    MPI_Alloc_mem(100, MPI_INFO_NULL, &result);
    MPI_Free_mem(result);
  }
  MPI_Finalize();
}

Compilation

/local/data/public/pmblakely/openmpi-3.0.0-install/bin/mpicc
./memory_leak_check_minimal.C -o ./memory_leak_check_minimal-3.0.0 -g
-O0

Testing

valgrind --tool=massif --threshold=0.1 --detailed-freq=1
./memory_leak_check_minimal-3.0.0

Then, ms_print --threshold=0.1 ./massif.out shows:

->62.52% (1,615,768B) 0x59F6D75: opal_free_list_grow_st (in
/local/data/public/pmblakely/openmpi-3.0.0-install/lib/libopen-pal.so.40.0.0)

near the beginning, and

->80.07% (4,312,304B) 0x59F6D75: opal_free_list_grow_st (in
/local/data/public/pmblakely/openmpi-3.0.0-install/lib/libopen-pal.so.40.0.0)

near the end of the test-run (the final mentions of
opal_free_list_grow_st show that the memory is eventually freed, but
probably by MPI_Finalize()).

I have tested the same program against MPICH 3.2.1 and it doesn't have the memory leak.

@pmblakely
Copy link
Author

@hjelmn : As on the OpenMPI users' list.

@hjelmn
Copy link
Member

hjelmn commented Dec 5, 2017

Does this machine have a high-performance interconnect? If so what kind? I can't reproduce the issue in a Linux VM with 16.04 or on macOS 10.12.

@ggouaillardet
Copy link
Contributor

@hjelmn i can reproduce this on my CentOS 7 VM (no high speed interconnect) with master

here is what i found

Threads,Function
1,main (leak.c:11)
1,  PMPI_Alloc_mem (palloc_mem.c:85)
1,    mca_mpool_base_alloc (mpool_base_alloc.c:87)
1,      mca_mpool_base_tree_insert (mpool_base_tree.c:110)

we end up inserting the mpool_tree_item in the mca_mpool_base_tree here

    rc = opal_rb_tree_insert(&mca_mpool_base_tree, item->key, item);

but

Threads,Function
1,main (leak.c:12)
1,  PMPI_Free_mem (pfree_mem.c:53)
1,    mca_mpool_base_free (mpool_base_alloc.c:110)
1,      mca_mpool_base_tree_find (mpool_base_tree.c:147)

the item cannot be retrieved from the mca_mpool_base_tree, and hence the memory leak

i made this patch that fixes one issue

diff --git a/opal/mca/mpool/base/mpool_base_alloc.c b/opal/mca/mpool/base/mpool_base_alloc.c
index 77165a3..1c4f87a 100644
--- a/opal/mca/mpool/base/mpool_base_alloc.c
+++ b/opal/mca/mpool/base/mpool_base_alloc.c
@@ -14,6 +14,8 @@
  * Copyright (c) 2010-2017 IBM Corporation. All rights reserved.
  * Copyright (c) 2015      Los Alamos National Security, LLC.  All rights
  *                         reserved.
+ * Copyright (c) 2017      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -84,6 +86,7 @@ void *mca_mpool_base_alloc(size_t size, opal_info_t *info, const char *hints)
         mca_mpool_base_tree_item_put (mpool_tree_item);
     } else {
         mpool_tree_item->mpool = mpool;
+        mpool_tree_item->key = mem;
         mca_mpool_base_tree_insert (mpool_tree_item);
     }
 

can you please review it before i commit and PR ?

meanwhile, i will make sure there are no more memory leaks

@pmblakely
Copy link
Author

Thank you Gilles and Nathan. I haven't tested the fix in our full CFD code yet, but the minimal example now only shows a single memory allocation from MPI_Alloc_mem() when I apply the fix to both 2.1.2 and 3.0.0.

@hjelmn
Copy link
Member

hjelmn commented Dec 6, 2017

Thanks @ggouaillardet. That fix looks familiar, has this come up before and not been actually patched?

@ggouaillardet
Copy link
Contributor

@hjelmn i remember some issues involving MPI_Alloc_mem() were raised before (such as #3182), but that fix here does not ring a bell.

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Dec 7, 2017
set the key of all mpool_tree_item objects, so they can be retrieved
in mpool_base_free and then returned back to the
mca_mpool_base_tree_item_free_list free list.

Refs. open-mpi#4567

Thanks Philip Blakely for the bug report.

Signed-off-by: Gilles Gouaillardet <[email protected]>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Dec 7, 2017
set the key of all mpool_tree_item objects, so they can be retrieved
in mpool_base_free and then returned back to the
mca_mpool_base_tree_item_free_list free list.

Refs. open-mpi#4567

Thanks Philip Blakely for the bug report.

Signed-off-by: Gilles Gouaillardet <[email protected]>

(cherry picked from commit open-mpi/ompi@11e5f86)
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Dec 7, 2017
set the key of all mpool_tree_item objects, so they can be retrieved
in mpool_base_free and then returned back to the
mca_mpool_base_tree_item_free_list free list.

Refs. open-mpi#4567

Thanks Philip Blakely for the bug report.

Signed-off-by: Gilles Gouaillardet <[email protected]>

(cherry picked from commit open-mpi/ompi@11e5f86)
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Dec 7, 2017
set the key of all mpool_tree_item objects, so they can be retrieved
in mpool_base_free and then returned back to the
mca_mpool_base_tree_item_free_list free list.

Refs. open-mpi#4567

Thanks Philip Blakely for the bug report.

Signed-off-by: Gilles Gouaillardet <[email protected]>

(cherry picked from commit open-mpi/ompi@11e5f86)
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Dec 9, 2017
set the key of all mpool_tree_item objects, so they can be retrieved
in mpool_base_free and then returned back to the
mca_mpool_base_tree_item_free_list free list.

Refs. open-mpi#4567

Thanks Philip Blakely for the bug report.

Signed-off-by: Gilles Gouaillardet <[email protected]>

(cherry picked from commit open-mpi/ompi@11e5f86)
bwbarrett pushed a commit that referenced this issue Dec 14, 2017
set the key of all mpool_tree_item objects, so they can be retrieved
in mpool_base_free and then returned back to the
mca_mpool_base_tree_item_free_list free list.

Refs. #4567

Thanks Philip Blakely for the bug report.

Signed-off-by: Gilles Gouaillardet <[email protected]>

(cherry picked from commit 11e5f86)
davideberius pushed a commit to davideberius/ompi that referenced this issue Dec 14, 2017
set the key of all mpool_tree_item objects, so they can be retrieved
in mpool_base_free and then returned back to the
mca_mpool_base_tree_item_free_list free list.

Refs. open-mpi#4567

Thanks Philip Blakely for the bug report.

Signed-off-by: Gilles Gouaillardet <[email protected]>
@ggouaillardet
Copy link
Contributor

Issue has been fixed on all active branches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants