Skip to content

[Fortran] MPI_Ialltoallw fails when using commited MPI datatypes #8763

Closed
@mathrack

Description

@mathrack

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

v4.1.0

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Compiled from source (https://www.open-mpi.org/software/ompi/v4.1/). debug build.

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

N/A

Please describe the system on which you are running

  • Operating system/version: Ubuntu 20.04
  • Computer hardware: i7-10810U
  • Network type: N/A

Details of the problem

I am trying to implement non-blocking communications in a large code. However, the code tends to fail for such cases. I have reproduced the error below. When running on one CPU, the code below works when switch is set to false but fails when switch is set to true. https://stackoverflow.com/questions/66932156/mpi-alltoallw-working-and-mpi-ialltoallw-failing

program main
      
  use mpi

  implicit none

  logical :: switch
  integer, parameter :: maxSize=128
  integer scounts(maxSize), sdispls(maxSize)
  integer rcounts(maxSize), rdispls(maxSize)
  integer :: types(maxSize)
  double precision sbuf(maxSize), rbuf(maxSize)
  integer comm, size, rank, req
  integer ierr
  integer ii

  call MPI_INIT(ierr)
  comm = MPI_COMM_WORLD
  call MPI_Comm_size(comm, size, ierr)
  call MPI_Comm_rank(comm, rank, ierr)

  switch = .true.

  ! Init
  sbuf(:) = rank
  scounts(:) = 0
  rcounts(:) = 0
  sdispls(:) = 0
  rdispls(:) = 0
  types(:) = MPI_INTEGER
  if (switch) then
    ! Send one time N double precision
    scounts(1)  = 1
    rcounts(1)  = 1
    sdispls(1)  = 0
    rdispls(1)  = 0
    call MPI_Type_create_subarray(1, (/maxSize/), &
                                     (/maxSize/), &
                                     (/0/), &
                                     MPI_ORDER_FORTRAN,MPI_DOUBLE_PRECISION, &
                                     types(1),ierr)
    call MPI_Type_commit(types(1),ierr)
  else
    ! Send N times one double precision
    do ii = 1, maxSize
      scounts(ii)  = 1
      rcounts(ii)  = 1
      sdispls(ii)  = ii-1
      rdispls(ii)  = ii-1
      types(ii)    = MPI_DOUBLE_PRECISION
    enddo
  endif

  call MPI_Ibarrier(comm, req, ierr)
  call MPI_Wait(req, MPI_STATUS_IGNORE, ierr)

  if (switch) then
    call MPI_Ialltoallw(sbuf, scounts, sdispls, types, &
                        rbuf, rcounts, rdispls, types, &
                        comm, req, ierr)
    call MPI_Wait(req, MPI_STATUS_IGNORE, ierr)
    call MPI_TYPE_FREE(types(1), ierr)
  else
    call MPI_alltoallw(sbuf, scounts, sdispls, types, &
                       rbuf, rcounts, rdispls, types, &
                       comm, ierr)
  endif

  call MPI_Finalize( ierr )

end program main

Running the program on one CPU as follows

$ mpirun -np 1 valgrind --vgdb=yes --vgdb-error=0 ./a.out

Valgrind produces the following error

==249074== Invalid read of size 8
==249074==    at 0x4EB0A6D: release_vecs_callback (coll_base_util.c:222)
==249074==    by 0x4EB100A: complete_vecs_callback (coll_base_util.c:245)
==249074==    by 0x74AD1CC: ompi_request_complete (request.h:441)
==249074==    by 0x74AE86D: ompi_coll_libnbc_progress (coll_libnbc_component.c:466)
==249074==    by 0x4FC0C39: opal_progress (opal_progress.c:231)
==249074==    by 0x4E04795: ompi_request_wait_completion (request.h:415)
==249074==    by 0x4E047EB: ompi_request_default_wait (req_wait.c:42)
==249074==    by 0x4E80AF7: PMPI_Wait (pwait.c:74)
==249074==    by 0x48A30D2: mpi_wait (pwait_f.c:76)
==249074==    by 0x10961A: MAIN__ (tmp.f90:61)
==249074==    by 0x1096C6: main (tmp.f90:7)
==249074==  Address 0x7758830 is 0 bytes inside a block of size 8 free'd
==249074==    at 0x483CA3F: free (vg_replace_malloc.c:540)
==249074==    by 0x4899CCC: PMPI_IALLTOALLW (pialltoallw_f.c:125)
==249074==    by 0x1095FC: MAIN__ (tmp.f90:61)
==249074==    by 0x1096C6: main (tmp.f90:7)
==249074==  Block was alloc'd at
==249074==    at 0x483B7F3: malloc (vg_replace_malloc.c:309)
==249074==    by 0x4899B4A: PMPI_IALLTOALLW (pialltoallw_f.c:90)
==249074==    by 0x1095FC: MAIN__ (tmp.f90:61)
==249074==    by 0x1096C6: main (tmp.f90:7)

gdb produces the following error

Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
0x0000000004eb0a6d in release_vecs_callback (request=0x7758af8) at ../../../../openmpi-4.1.0/ompi/mca/coll/base/coll_base_util.c:222
222             if (NULL != request->data.vecs.stypes[i]) {
(gdb) bt
#0  0x0000000004eb0a6d in release_vecs_callback (request=0x7758af8) at ../../../../openmpi-4.1.0/ompi/mca/coll/base/coll_base_util.c:222
#1  0x0000000004eb100b in complete_vecs_callback (req=0x7758af8) at ../../../../openmpi-4.1.0/ompi/mca/coll/base/coll_base_util.c:245
#2  0x00000000074ad1cd in ompi_request_complete (request=0x7758af8, with_signal=true) at ../../../../../openmpi-4.1.0/ompi/request/request.h:441
#3  0x00000000074ae86e in ompi_coll_libnbc_progress () at ../../../../../openmpi-4.1.0/ompi/mca/coll/libnbc/coll_libnbc_component.c:466
#4  0x0000000004fc0c3a in opal_progress () at ../../openmpi-4.1.0/opal/runtime/opal_progress.c:231
#5  0x0000000004e04796 in ompi_request_wait_completion (req=0x7758af8) at ../../openmpi-4.1.0/ompi/request/request.h:415
#6  0x0000000004e047ec in ompi_request_default_wait (req_ptr=0x1ffeffdbb8, status=0x1ffeffdbc0) at ../../openmpi-4.1.0/ompi/request/req_wait.c:42
#7  0x0000000004e80af8 in PMPI_Wait (request=0x1ffeffdbb8, status=0x1ffeffdbc0) at pwait.c:74
#8  0x00000000048a30d3 in ompi_wait_f (request=0x1ffeffe6cc, status=0x10c0a0 <mpi_fortran_status_ignore_>, ierr=0x1ffeffeee0) at pwait_f.c:76
#9  0x000000000010961b in MAIN__ () at tmp.f90:61

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions