Closed
Description
The MPI_Accumulate call has a local count/type and a target count/type and a range check was being performed with remote_count * target_extent.
The result is
[ibmgpu06:6313] *** An error occurred in MPI_Accumulate
[ibmgpu06:6313] *** reported by process [140737475969025,140381006069760]
[ibmgpu06:6313] *** on win rdma window 3
[ibmgpu06:6313] *** MPI_ERR_RMA_RANGE: invalid RMA address range
[ibmgpu06:6313] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[ibmgpu06:6313] *** and potentially your MPI job)
when running a testcase like this
https://gist.github.com/markalle/43e955343ea8ed7dd8bf757a73dae23d
I was using MXM and -mca osc rdma to hit the failure.
Metadata
Metadata
Assignees
Labels
No labels