|
| 1 | +- Feature Name: `multi_core_soundness` |
| 2 | +- Start Date: (fill me in with today's date, YYYY-MM-DD) |
| 3 | +- RFC PR: (leave this empty) |
| 4 | +- Rust Issue: (leave this empty) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +Change how we model multi-core MCUs in the following ways: |
| 10 | + |
| 11 | +* Stop using `Send` and `Sync` to model anything related to cross-core |
| 12 | + interactions since those traits are unfit for the purpose. |
| 13 | +* Declare that `Send` is not sufficient to transfer resources between cores, |
| 14 | + since memory addresses can have different meanings on different cores. |
| 15 | +* Mutexes based on critical sections become declared sound due to the above: |
| 16 | + They can only turn `Send` data `Sync`, but neither allows cross-core |
| 17 | + interactions anymore. |
| 18 | +* Punt on how to deal with cross-core communication and data sharing for now, |
| 19 | + and leave it to the ecosystem. |
| 20 | + |
| 21 | +# Motivation |
| 22 | +[motivation]: #motivation |
| 23 | + |
| 24 | +This RFC shares its motivation with [RFC 388]: The current way we model |
| 25 | +multi-core MCUs (which is just like multi-threaded Rust programs are modeled) |
| 26 | +means that [`bare_metal::Mutex`] and its reexports in `cortex_m` and other |
| 27 | +crates are unsound. |
| 28 | + |
| 29 | +With the push towards a 1.0 embedded ecosystem, we need to make sure that we do |
| 30 | +not expose unsound APIs like that, but also that our usage and understanding of |
| 31 | +language semantics is in line with upstream Rust and cannot lead to soundness |
| 32 | +issues down the road. |
| 33 | + |
| 34 | +Additionally, it would be great to have a good story for multi-core MCUs, since |
| 35 | +they are often difficult to develop for, while more and more vendors have |
| 36 | +multi-core products available not just for specialized applications, but as |
| 37 | +relatively general MCUs. |
| 38 | + |
| 39 | +## Today's Soundness Issues |
| 40 | +[todays-issues]: #todays-soundness-issues |
| 41 | + |
| 42 | +The auto trait behavior of `Send` and `Sync` causes soundness issues even today: |
| 43 | +Imagine a user-defined struct `S` that only contains some primitive types. |
| 44 | +Clearly, this struct will and should automatically implement `Send` and `Sync`. |
| 45 | +Now, because `S` implements `Sync`, `&S` will implement `Send`. This is simply |
| 46 | +the languages definition of these traits and nothing we can change. Now a big |
| 47 | +problem can arise when tranferring `&S` across core boundaries: Cores can have |
| 48 | +private memory regions that are not accessible by other cores. If `T: Send` is |
| 49 | +the only requirement for sending an object to another core, then this would not |
| 50 | +be sound. Existing multi-core support via [µAMP] attempts to work around this |
| 51 | +issue by requiring that `T: 'static`, but this approach turns out to be |
| 52 | +[unsound][soundness-1]. |
| 53 | + |
| 54 | +In addition to that, peripherals (either in the processor core, or ones provided |
| 55 | +by the MCU manufacturer) are generally `'static` and `Send` since being able to |
| 56 | +transfer them to an interrupt handler is an extremely common, safe operation. |
| 57 | +However, not all peripherals are available to all cores. In particular, *none* |
| 58 | +of the Cortex-M core peripherals are. |
| 59 | + |
| 60 | +[This blog post about µAMP][microamp-blog] outlines other issues and how they |
| 61 | +were solved or worked around in that case. Note that µAMP has other known |
| 62 | +[soundness issues][soundness-2] in addition to the one linked above. |
| 63 | + |
| 64 | +[µAMP]: https://github.com/rtfm-rs/microamp |
| 65 | +[microamp-blog]: https://blog.japaric.io/microamp/ |
| 66 | +[soundness-1]: https://github.com/rtfm-rs/microamp/issues/6 |
| 67 | +[soundness-2]: https://github.com/rtfm-rs/microamp#known-issues |
| 68 | + |
| 69 | +## The Core Issue |
| 70 | + |
| 71 | +To understand why `Send` is insufficient to transfer resource ownership across |
| 72 | +cores, observe that multi-core MCUs behave quite differently from multi-threaded |
| 73 | +programs (which are what `Send` is supposed to model): Each core can have |
| 74 | +private peripherals that may be `Send`able between interrupt handlers, but that |
| 75 | +do not exist (or, worse, something *else* exists at their address) on other |
| 76 | +cores of the system. The same goes for core-local memory regions: All normal |
| 77 | +operations are fine while they only happen on the core owning the memory |
| 78 | +(including `Send`ing some memory to an interrupt handler). |
| 79 | + |
| 80 | +Multi-threading in Rust was always modeled with the assumption that all threads |
| 81 | +share the same resources, with `Send` and `Sync` only guarding *access* to those |
| 82 | +resources. It is likely that when the language semantics around these traits are |
| 83 | +more rigidly specified, this will cause a clearer mismatch with what multi-core |
| 84 | +MCUs actually need. |
| 85 | + |
| 86 | +A similar problem to that of multi-core MCUs exists when writing hosted Rust |
| 87 | +applications: Inter-Process Communication (IPC). This is similar because while |
| 88 | +processes may share memory, `Send` and `Sync` are unsuitable to model any |
| 89 | +cross-process interaction since addresses and resources such as file descriptors |
| 90 | +on different processes can have completely different meanings. At the same time, |
| 91 | +it is important that `Send` and `Sync` *stay implemented* for both `File` and |
| 92 | +heap-allocated types like `Box`, since sending them across thread boundaries is |
| 93 | +still desired. |
| 94 | + |
| 95 | +Something much closer to a multi-threaded program is one that makes use of |
| 96 | +interrupt handlers, but runs on a single-core MCU: In both cases, all global |
| 97 | +resources exist precisely once, and may be shared and exchanged between |
| 98 | +interrupt handlers or threads (provided they implement `Send`/`Sync`) with no |
| 99 | +risk of the resources changing their meaning when sent. |
| 100 | + |
| 101 | +# Detailed design |
| 102 | +[design]: #detailed-design |
| 103 | + |
| 104 | +## Document what `Send` and `Sync` mean |
| 105 | + |
| 106 | +`Send` and `Sync` will be used only to model transfer and sharing of resources |
| 107 | +between different *execution contexts* that run with the same fixed set of |
| 108 | +global resources. |
| 109 | + |
| 110 | +Here, an *execution context* is something that may execute code asynchronously |
| 111 | +from other code (so without needing to be called). For example, a *thread* or |
| 112 | +an *interrupt handler* would qualify as an *execution context*, while a |
| 113 | +single-threaded futures executor would *not* create any more *execution |
| 114 | +contexts* while it executes. |
| 115 | + |
| 116 | +Concretely (for embedded Rust), that means `Send` and `Sync` can be used to |
| 117 | +model threads in an RTOS, or interrupt handlers for bare-metal applications. |
| 118 | + |
| 119 | +## `bare_metal::Mutex` is now sound |
| 120 | + |
| 121 | +Since a `Mutex` only turns `Send` data into `Sync` data, it does not allow any |
| 122 | +other cores in the system to access the protected data. Since disabling |
| 123 | +interrupts suspends all but the current *execution context*, exclusive access |
| 124 | +to the protected resource is now granted. |
| 125 | + |
| 126 | +Providing a way to share data between cores fundamentally depends on the memory |
| 127 | +layout of the device and application and is left to device-specific HALs, |
| 128 | +support crates or frameworks like [µAMP]. |
| 129 | + |
| 130 | +[`bare-metal`]: https://github.com/rust-embedded/bare-metal |
| 131 | + |
| 132 | +## The fate of Symmetric Multi-Processing (SMP) apps |
| 133 | + |
| 134 | +In SMP apps, only a single executable is used for all cores, while each core can |
| 135 | +have a separate entry point (or another mechanism of identifying the running |
| 136 | +core). This means that all `static`s are shared by default. |
| 137 | + |
| 138 | +Since defining a `static` only requires that its type is `Sync`, this would be |
| 139 | +unsound. For example, it would allow storing a `bare_metal::Mutex` in a `static` |
| 140 | +and access it from all cores. Therefore, this RFC foregoes the ability to write |
| 141 | +safe SMP apps in Rust, instead proposing to shift focus to AMP apps, which do |
| 142 | +not share data by default and produce a separate executable per core. |
| 143 | + |
| 144 | +# How We Teach This |
| 145 | +[how-we-teach-this]: #how-we-teach-this |
| 146 | + |
| 147 | +(see above) |
| 148 | + |
| 149 | +# Drawbacks |
| 150 | +[drawbacks]: #drawbacks |
| 151 | + |
| 152 | +* This makes writing applications for multi-core MCUs more difficult if there |
| 153 | + are no mature libraries for the target platform (that would provide APIs and |
| 154 | + traits for cross-core operations). |
| 155 | + |
| 156 | + While multi-core MCUs have become somewhat more common, it is expected that |
| 157 | + the vast majority of embedded Rust users will continue to use single-core |
| 158 | + MCUs. For those users, providing a sound and safe-to-use `Mutex` that works by |
| 159 | + disabling interrupts is beneficial. Even for multi-core applications, it is |
| 160 | + expected that the actual cross-core communication is limited to a small number |
| 161 | + of places in the code, so making it more difficult has limited impact. |
| 162 | + |
| 163 | +* This RFC generally rules out SMP apps that run the same firmware image on |
| 164 | + multiple cores. These would be able to share data via `static`s, which only |
| 165 | + requires a `Sync` bound, and that is not sufficient to guarantee safe |
| 166 | + operation when accessed from multiple cores. |
| 167 | + |
| 168 | +# Alternatives |
| 169 | +[alternatives]: #alternatives |
| 170 | + |
| 171 | +* Accept [RFC 388] instead, introducing a `SingleCore{Send,Sync}` auto trait |
| 172 | + pair once auto traits are stable, and make `Mutex::new` an `unsafe fn` in the |
| 173 | + interim. Remove unsound `Send` impls from peripherals. |
| 174 | + |
| 175 | +* Do what this RFC proposes, and also introduce `CoreSend`/`CoreSync` traits to |
| 176 | + model cross-core interaction. (This was what this RFC initially proposed, but |
| 177 | + it was decided that we should focus on fixing the soundness issues first and |
| 178 | + leave multi-core support to be implemented outside the core ecosystem.) |
| 179 | + |
| 180 | +# Unresolved questions |
| 181 | +[unresolved]: #unresolved-questions |
| 182 | + |
| 183 | +* None so far. |
| 184 | + |
| 185 | +[`bare_metal::Mutex`]: https://docs.rs/bare-metal/0.2.5/bare_metal/struct.Mutex.html |
| 186 | +[RFC 388]: https://github.com/rust-embedded/wg/pull/388 |
0 commit comments