-
Notifications
You must be signed in to change notification settings - Fork 227
fix: resolve MPMC queue contention #619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
//! The implementation is based on Dmitry Vyukov's bounded MPMC queue. | ||
//! | ||
//! Source: | ||
//! - <http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the same algorithm as the current one?
Maybe with some variations (the cache padding, which will be undesirable in many bare-metal embedded use MCUs, and spinning, also undesirable in this crate).
Are you sure that it effectively fixes the contention issue?
Maybe it just reduces the likelyhood of the problem being reproduced making it not show up in the tests. I suspect the added spin
are the reason. In a real-time scheduler, spinning is useless and the problem is much more likely to show up.
I would prefer first adding loom
tests to make sure that this case is not reached in any new implementation through exhaustive testing of branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the same algorithm as the current one?
Yes, it is the port of the crossbeam::ArrayQueue
mentioned in the associated issue.
Maybe with some variations (the cache padding, which will be undesirable in many bare-metal embedded use MCUs, and spinning, also undesirable in this crate).
Alright, well...
Are you sure that it effectively fixes the contention issue?
Maybe it just reduces the likelyhood of the problem being reproduced making it not show up in the tests. I suspect the added spin are the reason. In a real-time scheduler, spinning is useless and the problem is much more likely to show up.
I have re-run the tests multiple times (over 20), and the issue has not caused failed tests in the crossbeam
implementation.
However, I can run the tests with the current implementation, and it fails every time. Sometimes the enqueue
test will fail, sometimes the dequeue
test will fail, sometimes both. But every run, one of the tests will fail.
To me, that means that the alternative implementation is an improvement, at least for this issue.
I would prefer first adding loom tests to make sure that this case is not reached in any new implementation through exhaustive testing of branches.
Please see the comments in the loom
PR (#618), they are not compatible with this section of the code.
The tests fail with an error about branch exhaustion:
thread 'mpmc::tests::issue_583_enqueue_loom' panicked at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/loom-0.7.2/src/rt/path.rs:247:13:
Model exceeded maximum number of branches. This is often caused by an algorithm requiring the processor to make progress, e.g. spin locks.
The loom
tests also require a non-trivial amount of refactors to the crossbeam
implementation due to the differences in how their atomic
primitives are implemented. I'm not going to do all of that work just to have the loom
tests nope out with the above error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error means that that one thread cannot make progress is one other is starved because it's spinning. Spinning is useless in a real-time scheduler like RTIC
which is a lot of heapless
's users.
You should be able to resolve it and make the test pass with loom::lazy_static::yield_now
whenever there is spinning, but that means the original problem is still present under a real-time scheduler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That error comes from the existing implementation, so what is your point exactly about a real-time scheduler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to diverge from the existing algorithm, we could look into the NBLFQ algorithm: https://ieeexplore.ieee.org/abstract/document/11078405
Here is a non-paywalled link: https://inria.hal.science/hal-04851700v2/document
And wCQ
(wait-free queue, referenced in the NBLFQ paper): https://arxiv.org/pdf/2201.02179
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think actually the loom tests failures about spinning were not about spinning but rather about the way too high number of iterations.
I fixed it and added loom tests in #620 (also documenting the issue so we can cut a new release soon if changing the algorithm would take too long).
With the current implementation they fail (and I'm 99% certain they would also fail on the ArrayQueue
implementation, at least because of the spinning).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current implementation they fail (and I'm 99% certain they would also fail on the ArrayQueue implementation, at least because of the spinning).
Only one way to know for sure, and that's to do the necessary refactors then test.
Thanks for figuring out the issue with the high number of iterations that fixed the test issue. I'll pull in your work, and see if the loom
tests fail on the ArrayQueue
implementation. If they pass, great because that means we can stick with the same algorithm.
If the loom
tests fail for ArrayQueue
, I think implementing wCQ
(which is a wait-free queue), would be the most likely to solve the issue. Especially given the constraints of running under a real-time scheduler, since the wait-free guarantees are stronger than lock-free.
I would like to work on that implementation, if we decide to go that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current implementation they fail (and I'm 99% certain they would also fail on the ArrayQueue implementation, at least because of the spinning)
I pulled in your commit from #620, and ran the loom
tests against ArrayQueue
. You were right, the tests fail with:
running 2 tests
test mpmc::tests_loom::loom_issue_583_dequeue ... FAILED
test mpmc::tests_loom::loom_issue_583_enqueue ... FAILED
failures:
---- mpmc::tests_loom::loom_issue_583_dequeue stdout ----
thread 'mpmc::tests_loom::loom_issue_583_dequeue' panicked at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/loom-0.7.2/src/rt/path.rs:247:13:
Model exceeded maximum number of branches. This is often caused by an algorithm requiring the processor to make progress, e.g. spin locks.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- mpmc::tests_loom::loom_issue_583_enqueue stdout ----
thread 'mpmc::tests_loom::loom_issue_583_enqueue' panicked at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/loom-0.7.2/src/rt/path.rs:247:13:
Model exceeded maximum number of branches. This is often caused by an algorithm requiring the processor to make progress, e.g. spin locks.
failures:
mpmc::tests_loom::loom_issue_583_dequeue
mpmc::tests_loom::loom_issue_583_enqueue
If these failures map to the real-time scheduler issue you mentioned, I think it is worth it to implement either wCQ
or NBLFQ
. I'll start working on the implementation, and add it in a commit to this PR.
This patch documents rust-embedded#583 adds failing loom tests to exercise rust-embedded#583 hoping that a future algorithm change can make them pass. The tests are ran in CI but their results is ignored for now as they fail. Ideally the `loom` tests will cover more usage of `mpmc::Queue` in the future and also cover `spsc`.
Ports the `crossbeam::ArrayQueue` impl for MPMC to using a `heapless`-compatible API.
Adds a port of the
crossbeam::ArrayQueue
implementation to replaceheapless::mpmc::Queue
.Resolves an issue with contention over concurrent calls to
Queue::dequeue
<=>Queue::enqueue
when the queue is full.Resolves: #583