Closed
Description
We observed this error only once in our MTT. We are running v2.x with SLURM/pmix there. It is possible that it is somehow related to this configuration, though I doubt that.
Here is the error message:
export OMPI_MCA_btl_openib_if_include=mlx4_0:1
OMPI_MCA_btl_openib_if_include=mlx4_0:1
OMPI_MCA_mpi_add_procs_cutoff=0+ OMPI_MCA_pmix_base_async_modex=1
OMPI_MCA_pmix_base_collect_data=0
/tmp/mtt_116453_slurm/bin/srun -N 8 -n 64 --mpi=pmix_v1 -p pmellanox <mtt-base>/installs/T8JL/tests/mpich_tests/mpich-mellanox.git/test/mpi/coll/allgather2
[boo13:10605] Attempt to free memory that is still in use by an ongoing MPI communication (buffer 0xa89000, size 9302016). MPI job will now abort.
srun: error: boo13: task 9: Exited with exit code 1
srun: Terminating job step 1592.0