fix: resolve MPMC queue contention #619

0xllx0 · 2025-10-06T06:21:11Z

Adds a port of the crossbeam::ArrayQueue implementation to replace heapless::mpmc::Queue.

Resolves an issue with contention over concurrent calls to Queue::dequeue <=> Queue::enqueue when the queue is full.

Resolves: #583

sgued · 2025-10-08T08:22:02Z

src/mpmc/crossbeam_array_queue.rs

+//! The implementation is based on Dmitry Vyukov's bounded MPMC queue.
+//!
+//! Source:
+//!   - <http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue>


So this is the same algorithm as the current one?
Maybe with some variations (the cache padding, which will be undesirable in many bare-metal embedded use MCUs, and spinning, also undesirable in this crate).

Are you sure that it effectively fixes the contention issue?

Maybe it just reduces the likelyhood of the problem being reproduced making it not show up in the tests. I suspect the added spin are the reason. In a real-time scheduler, spinning is useless and the problem is much more likely to show up.

I would prefer first adding loom tests to make sure that this case is not reached in any new implementation through exhaustive testing of branches.

So this is the same algorithm as the current one?

Yes, it is the port of the crossbeam::ArrayQueue mentioned in the associated issue.

Maybe with some variations (the cache padding, which will be undesirable in many bare-metal embedded use MCUs, and spinning, also undesirable in this crate).

Alright, well...

Are you sure that it effectively fixes the contention issue?

Maybe it just reduces the likelyhood of the problem being reproduced making it not show up in the tests. I suspect the added spin are the reason. In a real-time scheduler, spinning is useless and the problem is much more likely to show up.

I have re-run the tests multiple times (over 20), and the issue has not caused failed tests in the crossbeam implementation.

However, I can run the tests with the current implementation, and it fails every time. Sometimes the enqueue test will fail, sometimes the dequeue test will fail, sometimes both. But every run, one of the tests will fail.

To me, that means that the alternative implementation is an improvement, at least for this issue.

I would prefer first adding loom tests to make sure that this case is not reached in any new implementation through exhaustive testing of branches.

Please see the comments in the loom PR (#618), they are not compatible with this section of the code.

The tests fail with an error about branch exhaustion:

thread 'mpmc::tests::issue_583_enqueue_loom' panicked at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/loom-0.7.2/src/rt/path.rs:247:13: Model exceeded maximum number of branches. This is often caused by an algorithm requiring the processor to make progress, e.g. spin locks.

The loom tests also require a non-trivial amount of refactors to the crossbeam implementation due to the differences in how their atomic primitives are implemented. I'm not going to do all of that work just to have the loom tests nope out with the above error message.

This error means that that one thread cannot make progress is one other is starved because it's spinning. Spinning is useless in a real-time scheduler like RTIC which is a lot of heapless's users.

You should be able to resolve it and make the test pass with loom::lazy_static::yield_now whenever there is spinning, but that means the original problem is still present under a real-time scheduler

That error comes from the existing implementation, so what is your point exactly about a real-time scheduler?

If we want to diverge from the existing algorithm, we could look into the NBLFQ algorithm: https://ieeexplore.ieee.org/abstract/document/11078405

Here is a non-paywalled link: https://inria.hal.science/hal-04851700v2/document
And wCQ (wait-free queue, referenced in the NBLFQ paper): https://arxiv.org/pdf/2201.02179

I think actually the loom tests failures about spinning were not about spinning but rather about the way too high number of iterations.

I fixed it and added loom tests in #620 (also documenting the issue so we can cut a new release soon if changing the algorithm would take too long).
With the current implementation they fail (and I'm 99% certain they would also fail on the ArrayQueue implementation, at least because of the spinning).

With the current implementation they fail (and I'm 99% certain they would also fail on the ArrayQueue implementation, at least because of the spinning).

Only one way to know for sure, and that's to do the necessary refactors then test.

Thanks for figuring out the issue with the high number of iterations that fixed the test issue. I'll pull in your work, and see if the loom tests fail on the ArrayQueue implementation. If they pass, great because that means we can stick with the same algorithm.

If the loom tests fail for ArrayQueue, I think implementing wCQ (which is a wait-free queue), would be the most likely to solve the issue. Especially given the constraints of running under a real-time scheduler, since the wait-free guarantees are stronger than lock-free.

I would like to work on that implementation, if we decide to go that way.

With the current implementation they fail (and I'm 99% certain they would also fail on the ArrayQueue implementation, at least because of the spinning)

I pulled in your commit from #620, and ran the loom tests against ArrayQueue. You were right, the tests fail with:

running 2 tests test mpmc::tests_loom::loom_issue_583_dequeue ... FAILED test mpmc::tests_loom::loom_issue_583_enqueue ... FAILED failures: ---- mpmc::tests_loom::loom_issue_583_dequeue stdout ---- thread 'mpmc::tests_loom::loom_issue_583_dequeue' panicked at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/loom-0.7.2/src/rt/path.rs:247:13: Model exceeded maximum number of branches. This is often caused by an algorithm requiring the processor to make progress, e.g. spin locks. note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ---- mpmc::tests_loom::loom_issue_583_enqueue stdout ---- thread 'mpmc::tests_loom::loom_issue_583_enqueue' panicked at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/loom-0.7.2/src/rt/path.rs:247:13: Model exceeded maximum number of branches. This is often caused by an algorithm requiring the processor to make progress, e.g. spin locks. failures: mpmc::tests_loom::loom_issue_583_dequeue mpmc::tests_loom::loom_issue_583_enqueue

If these failures map to the real-time scheduler issue you mentioned, I think it is worth it to implement either wCQ or NBLFQ. I'll start working on the implementation, and add it in a commit to this PR.

This patch documents rust-embedded#583 adds failing loom tests to exercise rust-embedded#583 hoping that a future algorithm change can make them pass. The tests are ran in CI but their results is ignored for now as they fail. Ideally the `loom` tests will cover more usage of `mpmc::Queue` in the future and also cover `spsc`.

Ports the `crossbeam::ArrayQueue` impl for MPMC to using a `heapless`-compatible API.

0xllx0 mentioned this pull request Oct 6, 2025

mpmc: add loom test for enqueue/dequeue contention #618

Closed

sgued reviewed Oct 8, 2025

View reviewed changes

0xllx0 mentioned this pull request Oct 8, 2025

MpMcQueue can fail to push even when it's not full under contention #583

Open

mpmc: use heapless API for crossbeam impl

d37422f

Ports the `crossbeam::ArrayQueue` impl for MPMC to using a `heapless`-compatible API.

0xllx0 force-pushed the fix/issue-583 branch from 4bce3b6 to d37422f Compare October 8, 2025 23:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve MPMC queue contention #619

fix: resolve MPMC queue contention #619

Uh oh!

0xllx0 commented Oct 6, 2025 •

edited

Loading

Uh oh!

sgued Oct 8, 2025

Uh oh!

0xllx0 Oct 8, 2025 •

edited

Loading

Uh oh!

sgued Oct 8, 2025 •

edited

Loading

Uh oh!

0xllx0 Oct 8, 2025

Uh oh!

0xllx0 Oct 8, 2025 •

edited

Loading

Uh oh!

sgued Oct 8, 2025 •

edited

Loading

Uh oh!

0xllx0 Oct 8, 2025

Uh oh!

0xllx0 Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: resolve MPMC queue contention #619

Are you sure you want to change the base?

fix: resolve MPMC queue contention #619

Uh oh!

Conversation

0xllx0 commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgued Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

0xllx0 Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgued Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0xllx0 Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

0xllx0 Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgued Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0xllx0 Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

0xllx0 Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0xllx0 commented Oct 6, 2025 •

edited

Loading

0xllx0 Oct 8, 2025 •

edited

Loading

sgued Oct 8, 2025 •

edited

Loading

0xllx0 Oct 8, 2025 •

edited

Loading

sgued Oct 8, 2025 •

edited

Loading