Skip to content

Conversation

0xllx0
Copy link

@0xllx0 0xllx0 commented Oct 6, 2025

Adds a port of the crossbeam::ArrayQueue implementation to replace heapless::mpmc::Queue.

Resolves an issue with contention over concurrent calls to Queue::dequeue <=> Queue::enqueue when the queue is full.

Resolves: #583

//! The implementation is based on Dmitry Vyukov's bounded MPMC queue.
//!
//! Source:
//! - <http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the same algorithm as the current one?
Maybe with some variations (the cache padding, which will be undesirable in many bare-metal embedded use MCUs, and spinning, also undesirable in this crate).

Are you sure that it effectively fixes the contention issue?

Maybe it just reduces the likelyhood of the problem being reproduced making it not show up in the tests. I suspect the added spin are the reason. In a real-time scheduler, spinning is useless and the problem is much more likely to show up.

I would prefer first adding loom tests to make sure that this case is not reached in any new implementation through exhaustive testing of branches.

Copy link
Author

@0xllx0 0xllx0 Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the same algorithm as the current one?

Yes, it is the port of the crossbeam::ArrayQueue mentioned in the associated issue.

Maybe with some variations (the cache padding, which will be undesirable in many bare-metal embedded use MCUs, and spinning, also undesirable in this crate).

Alright, well...

Are you sure that it effectively fixes the contention issue?

Maybe it just reduces the likelyhood of the problem being reproduced making it not show up in the tests. I suspect the added spin are the reason. In a real-time scheduler, spinning is useless and the problem is much more likely to show up.

I have re-run the tests multiple times (over 20), and the issue has not caused failed tests in the crossbeam implementation.

However, I can run the tests with the current implementation, and it fails every time. Sometimes the enqueue test will fail, sometimes the dequeue test will fail, sometimes both. But every run, one of the tests will fail.

To me, that means that the alternative implementation is an improvement, at least for this issue.

I would prefer first adding loom tests to make sure that this case is not reached in any new implementation through exhaustive testing of branches.

Please see the comments in the loom PR (#618), they are not compatible with this section of the code.

The tests fail with an error about branch exhaustion:

thread 'mpmc::tests::issue_583_enqueue_loom' panicked at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/loom-0.7.2/src/rt/path.rs:247:13:
Model exceeded maximum number of branches. This is often caused by an algorithm requiring the processor to make progress, e.g. spin locks.

The loom tests also require a non-trivial amount of refactors to the crossbeam implementation due to the differences in how their atomic primitives are implemented. I'm not going to do all of that work just to have the loom tests nope out with the above error message.

Copy link
Contributor

@sgued sgued Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error means that that one thread cannot make progress is one other is starved because it's spinning. Spinning is useless in a real-time scheduler like RTIC which is a lot of heapless's users.

You should be able to resolve it and make the test pass with loom::lazy_static::yield_now whenever there is spinning, but that means the original problem is still present under a real-time scheduler

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That error comes from the existing implementation, so what is your point exactly about a real-time scheduler?

Copy link
Author

@0xllx0 0xllx0 Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to diverge from the existing algorithm, we could look into the NBLFQ algorithm: https://ieeexplore.ieee.org/abstract/document/11078405

Here is a non-paywalled link: https://inria.hal.science/hal-04851700v2/document
And wCQ (wait-free queue, referenced in the NBLFQ paper): https://arxiv.org/pdf/2201.02179

Copy link
Contributor

@sgued sgued Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think actually the loom tests failures about spinning were not about spinning but rather about the way too high number of iterations.

I fixed it and added loom tests in #620 (also documenting the issue so we can cut a new release soon if changing the algorithm would take too long).
With the current implementation they fail (and I'm 99% certain they would also fail on the ArrayQueue implementation, at least because of the spinning).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current implementation they fail (and I'm 99% certain they would also fail on the ArrayQueue implementation, at least because of the spinning).

Only one way to know for sure, and that's to do the necessary refactors then test.

Thanks for figuring out the issue with the high number of iterations that fixed the test issue. I'll pull in your work, and see if the loom tests fail on the ArrayQueue implementation. If they pass, great because that means we can stick with the same algorithm.

If the loom tests fail for ArrayQueue, I think implementing wCQ (which is a wait-free queue), would be the most likely to solve the issue. Especially given the constraints of running under a real-time scheduler, since the wait-free guarantees are stronger than lock-free.

I would like to work on that implementation, if we decide to go that way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current implementation they fail (and I'm 99% certain they would also fail on the ArrayQueue implementation, at least because of the spinning)

I pulled in your commit from #620, and ran the loom tests against ArrayQueue. You were right, the tests fail with:

running 2 tests
test mpmc::tests_loom::loom_issue_583_dequeue ... FAILED
test mpmc::tests_loom::loom_issue_583_enqueue ... FAILED

failures:

---- mpmc::tests_loom::loom_issue_583_dequeue stdout ----

thread 'mpmc::tests_loom::loom_issue_583_dequeue' panicked at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/loom-0.7.2/src/rt/path.rs:247:13:
Model exceeded maximum number of branches. This is often caused by an algorithm requiring the processor to make progress, e.g. spin locks.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- mpmc::tests_loom::loom_issue_583_enqueue stdout ----

thread 'mpmc::tests_loom::loom_issue_583_enqueue' panicked at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/loom-0.7.2/src/rt/path.rs:247:13:
Model exceeded maximum number of branches. This is often caused by an algorithm requiring the processor to make progress, e.g. spin locks.


failures:
    mpmc::tests_loom::loom_issue_583_dequeue
    mpmc::tests_loom::loom_issue_583_enqueue

If these failures map to the real-time scheduler issue you mentioned, I think it is worth it to implement either wCQ or NBLFQ. I'll start working on the implementation, and add it in a commit to this PR.

This patch documents rust-embedded#583 adds failing loom tests to exercise rust-embedded#583
hoping that a future algorithm change can make them pass.

The tests are ran in CI but their results is ignored for now as they fail.

Ideally the `loom` tests will cover more usage of `mpmc::Queue` in the future
and also cover `spsc`.
Ports the `crossbeam::ArrayQueue` impl for MPMC to using a
`heapless`-compatible API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MpMcQueue can fail to push even when it's not full under contention

2 participants