Skip to content

opal/fifo: use atomics to set fifo head in opal_fifo_push #1470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 18, 2016

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Mar 17, 2016

This commit changes the opal_fifo_push code to use
opal_update_counted_pointer to set the head. This fixes a data race
that occurs because the read of the fifo head in opal_fifo_pop
requires two instructions. This combined with the non-atomic update in
opal_fifo_push can lead to an ABA issue that puts the fifo in an
inconsistant state.

There are other ways this problem could be fixed. One way would be to
introduce an opal_atomic_read_128 implementation. On x86_64 this would
have to use the cmpxchg16b instruction. Since this instruction would
have to be in the pop path (and always executed) it would be slower
than the fix in this commit.

Closes #1460.

Signed-off-by: Nathan Hjelm [email protected]

This commit changes the opal_fifo_push code to use
opal_update_counted_pointer to set the head. This fixes a data race
that occurs because the read of the fifo head in opal_fifo_pop
requires two instructions. This combined with the non-atomic update in
opal_fifo_push can lead to an ABA issue that puts the fifo in an
inconsistant state.

There are other ways this problem could be fixed. One way would be to
introduce an opal_atomic_read_128 implementation. On x86_64 this would
have to use the cmpxchg16b instruction. Since this instruction would
have to be in the pop path (and always executed) it would be slower
than the fix in this commit.

Closes open-mpi#1460.

Signed-off-by: Nathan Hjelm <[email protected]>
@adrianreber
Copy link
Member

I tried to run the old opal_fifo in a loop and it segfaulted during the 8th run. With @hjelmn's patch applied it has now ran over 140 times without a segfault. I keep it running for a few more hours

@hjelmn
Copy link
Member Author

hjelmn commented Mar 17, 2016

Mellanox jenkins failure looks unrelated.

@hjelmn
Copy link
Member Author

hjelmn commented Mar 17, 2016

:bot:retest:

hjelmn added a commit that referenced this pull request Mar 18, 2016
opal/fifo: use atomics to set fifo head in opal_fifo_push
@hjelmn hjelmn merged commit c749d7d into open-mpi:master Mar 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opal_fifo segfaults on x86_64
2 participants