Skip to content

Commit a633d2b

Browse files
authored
Workaround to bypass issue observed at very large scale with Fujitsu MPI (#2874)
We have observed some MPI issues at very large scale when WarpX is compiled using Fujitsu MPI (i.e., with the Fujitsu compiler). These issues seem to be related to the use of MPI Gatherv with MPI_Datatype. This PR implements a possible workaround, initially proposed by @WeiqunZhang . The idea is that, when WarpX is compiled with the Fujitsu compiler, simpler integer arrays instead of MPI_Datatype are used in the routine where the issue was observed.
1 parent 7660c88 commit a633d2b

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

Src/AmrCore/AMReX_TagBox.cpp

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -649,7 +649,24 @@ TagBoxArray::collate (Gpu::PinnedVector<IntVect>& TheGlobalCollateSpace) const
649649
//
650650
const IntVect* psend = (count > 0) ? TheLocalCollateSpace.data() : nullptr;
651651
IntVect* precv = TheGlobalCollateSpace.data();
652+
653+
//Issues have been observed with the following call at very large scale when using
654+
//FujitsuMPI. The issue seems to be related to the use of MPI_Datatype. We can
655+
//bypasses the issue by exchanging simpler integer arrays.
656+
#ifndef __FUJITSU
652657
ParallelDescriptor::Gatherv(psend, count, precv, countvec, offset, IOProcNumber);
658+
#else
659+
const int* psend_int = psend->begin();
660+
int* precv_int = precv->begin();
661+
Long count_int = count * AMREX_SPACEDIM;
662+
auto countvec_int = std::vector<int>(countvec.size());
663+
auto offset_int = std::vector<int>(offset.size());
664+
const auto mul_funct = [](const auto el){return el*AMREX_SPACEDIM;};
665+
std::transform(countvec.begin(), countvec.end(), countvec_int.begin(), mul_funct);
666+
std::transform(offset.begin(), offset.end(), offset_int.begin(), mul_funct);
667+
ParallelDescriptor::Gatherv(
668+
psend_int, count_int, precv_int, countvec_int, offset_int, IOProcNumber);
669+
#endif
653670

654671
#else
655672
TheGlobalCollateSpace = std::move(TheLocalCollateSpace);

0 commit comments

Comments
 (0)