-
Notifications
You must be signed in to change notification settings - Fork 62
Description
Hi! I'm not sure if this is the best place for this question, but it seems worth a shot.
I'm trying to express an ordering between two volatile writes from a single mutator, but the docs don't appear to address this and so I am wary. The corresponding C++ docs are still vague but less so.
Details: A program needs to perform two writes, each volatile -- perhaps they are memory-mapped I/O. The writes must happen (which is to say, complete) in order -- perhaps the first one turns on the physical device that the second one addresses. Is there something from core
that I can slip into the middle in the example below to ensure this?
let ptr_a: *mut u32 = ...;
let ptr_b: *mut u32 = ...; // not equal to ptr_a
ptr_a.write_volatile(0xDEAD);
// insert appropriate barrier/fence here
ptr_b.write_volatile(0xBEEF);
Were I willing to be architecture-specific, I know the specific barrier instruction I'm after, and I could express it using inline asm. But it'd be lovely to use something portable. core::sync::atomic::fence
-- probably with Release
since it's a write-write situation -- was the first thing I reached for, but seeing as these are not atomic accesses per se, the docs on fence
imply that it has no effect on their ordering. (Specifically, there are no mentions of volatile
anywhere in the atomics docs.)
The C++ memory order documentation does discuss the relationship with volatile
, but (1) I admit I don't entirely understand its single relevant sentence, and (2) the remaining sentences are trying to scare off people attempting to use volatile
for inter-thread synchronization, which I am not. Plus, I'm not writing C++. :-)
Random people on the Internet keep asserting that fence is sufficient for all our memory-barrier needs, but this doesn't seem obvious to me from the docs. (I'm also more accustomed to the traditional terms around barriers than the atomic memory ordering terms, so this may reflect my own ignorance!)
Pragmatically,
- From reading threads here and on the LLVM archives, it looks like LLVM currently preserves relative ordering of atomic and
volatile
accesses, but I am hesitant to either rely on compiler behavior that may be subject to change, or assume that my backend is LLVM. - A number of
Ordering
s given tofence
currently produce the instruction I want on my particular target, but that feels fragile, particularly since my target has fewer barrier varieties than, say, PowerPC, so it might be working by accident.
More detailed context: The system I'm working on is an ARM Cortex-M7 based SoC. The M7 has a fairly complex bus interface, and can issue and retire memory accesses out of order if they issue on different bus ports (which, in practice, means that they apply to different coarse-grained sections of physical address space). The architecture-specific thing to do here is to insert a dmb
instruction (available in the cortex_m
crate, if you are using it, as cortex_m::asm::dmb()
). However, the driver in question is for a generic IP block (a Synopsys Ethernet MAC) that is not inherently ARM-specific, so it'd be great to express this portably.
As you have likely inferred, the goal is to wait for the completion of the first write, not its issuance in program order, and so compiler_fence
is not useful here.
Any insight would be greatly appreciated!