Skip to content

osc rdma origin/target extent mixup #3569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
markalle opened this issue May 24, 2017 · 1 comment
Closed

osc rdma origin/target extent mixup #3569

markalle opened this issue May 24, 2017 · 1 comment

Comments

@markalle
Copy link
Contributor

The MPI_Accumulate call has a local count/type and a target count/type and a range check was being performed with remote_count * target_extent.

The result is

[ibmgpu06:6313] *** An error occurred in MPI_Accumulate
[ibmgpu06:6313] *** reported by process [140737475969025,140381006069760]
[ibmgpu06:6313] *** on win rdma window 3
[ibmgpu06:6313] *** MPI_ERR_RMA_RANGE: invalid RMA address range
[ibmgpu06:6313] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[ibmgpu06:6313] ***    and potentially your MPI job)

when running a testcase like this
https://gist.github.com/markalle/43e955343ea8ed7dd8bf757a73dae23d

I was using MXM and -mca osc rdma to hit the failure.

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue May 24, 2017
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue May 25, 2017
…t_accumulate_internal()

origin_datatype and target_datatype might be different and hence have different extent,
so use either origin_extent of target_extent when appropriate.

Refs open-mpi#3569

Signed-off-by: Gilles Gouaillardet <[email protected]>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue May 25, 2017
…t_accumulate_internal()

origin_datatype and target_datatype might be different and hence have different extent,
so use either origin_extent of target_extent when appropriate.

Refs open-mpi#3569

Signed-off-by: Gilles Gouaillardet <[email protected]>
@markalle
Copy link
Contributor Author

I had made a pull to fix this at
#3570
but independently
#3572
was also made for the same reason.

So I closed my 3570 pull and this bug should be fixed by 3572.

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue May 26, 2017
…t_accumulate_internal()

origin_datatype and target_datatype might be different and hence have different extent,
so use either origin_extent or target_extent when appropriate.

Refs open-mpi#3569

Signed-off-by: Gilles Gouaillardet <[email protected]>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue May 29, 2017
…t_accumulate_internal()

origin_datatype and target_datatype might be different and hence have different extent,
so use either origin_extent or target_extent when appropriate.

Refs open-mpi#3569

Signed-off-by: Gilles Gouaillardet <[email protected]>
(cherry picked from commit 0f79259)
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue May 29, 2017
…t_accumulate_internal()

origin_datatype and target_datatype might be different and hence have different extent,
so use either origin_extent or target_extent when appropriate.

Refs open-mpi#3569

Signed-off-by: Gilles Gouaillardet <[email protected]>

(back-ported from commit open-mpi/ompi@0f79259)
@markalle markalle closed this as completed Jun 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant