Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

osc/rdma: use correct endpoint for local state #911

Merged
merged 1 commit into from
Jan 26, 2016

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Jan 22, 2016

If atomics are not globally visible (cpu and nic atomics do not mix)
then a btl endpoint must be used to access local ranks. To avoid
issues that are caused by having the same region registered with
multiple handles osc/rdma was updated to always use the handle for
rank 0. There was a bug in the update that caused osc/rdma to continue
using the local endpoint for accessing the state even though the
pointer/handle are not valid for that endpoint. This commit fixes the
bug.

Fixes open-mpi/ompi#1241.

(cherry picked from open-mpi/ompi@49d2f44)

:bot🏷️bug
:bot:milestone:v2.0.0
:bot:assign: @sjeaugey

Signed-off-by: Nathan Hjelm [email protected]

If atomics are not globally visible (cpu and nic atomics do not mix)
then a btl endpoint must be used to access local ranks. To avoid
issues that are caused by having the same region registered with
multiple handles osc/rdma was updated to always use the handle for
rank 0. There was a bug in the update that caused osc/rdma to continue
using the local endpoint for accessing the state even though the
pointer/handle are not valid for that endpoint. This commit fixes the
bug.

Fixes open-mpi/ompi#1241.

(cherry picked from open-mpi/ompi@49d2f44)

Signed-off-by: Nathan Hjelm <[email protected]>
@ompiteam-bot ompiteam-bot added this to the v2.0.0 milestone Jan 22, 2016
@hjelmn
Copy link
Member Author

hjelmn commented Jan 22, 2016

This will fix nvidia MTT issues. Wait on merge until after master MTT has run tonight.

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1248/ for details.

@hppritcha
Copy link
Member

@sjeaugey please double check that open-mpi/ompi@49d2f44 cleared up problems on master.

@hjelmn
Copy link
Member Author

hjelmn commented Jan 26, 2016

As mentioned in open-mpi/ompi#1319 this is ok.

jsquyres added a commit that referenced this pull request Jan 26, 2016
osc/rdma: use correct endpoint for local state
@jsquyres jsquyres merged commit 20670b5 into open-mpi:v2.x Jan 26, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Atomics get completion with error on mlx5
6 participants