Skip to content

Interaction between volatile and fence #260

@cbiffle

Description

@cbiffle

Hi! I'm not sure if this is the best place for this question, but it seems worth a shot.

I'm trying to express an ordering between two volatile writes from a single mutator, but the docs don't appear to address this and so I am wary. The corresponding C++ docs are still vague but less so.

Details: A program needs to perform two writes, each volatile -- perhaps they are memory-mapped I/O. The writes must happen (which is to say, complete) in order -- perhaps the first one turns on the physical device that the second one addresses. Is there something from core that I can slip into the middle in the example below to ensure this?

let ptr_a: *mut u32 = ...;
let ptr_b: *mut u32 = ...; // not equal to ptr_a

ptr_a.write_volatile(0xDEAD);
// insert appropriate barrier/fence here
ptr_b.write_volatile(0xBEEF);

Were I willing to be architecture-specific, I know the specific barrier instruction I'm after, and I could express it using inline asm. But it'd be lovely to use something portable. core::sync::atomic::fence -- probably with Release since it's a write-write situation -- was the first thing I reached for, but seeing as these are not atomic accesses per se, the docs on fence imply that it has no effect on their ordering. (Specifically, there are no mentions of volatile anywhere in the atomics docs.)

The C++ memory order documentation does discuss the relationship with volatile, but (1) I admit I don't entirely understand its single relevant sentence, and (2) the remaining sentences are trying to scare off people attempting to use volatile for inter-thread synchronization, which I am not. Plus, I'm not writing C++. :-)

Random people on the Internet keep asserting that fence is sufficient for all our memory-barrier needs, but this doesn't seem obvious to me from the docs. (I'm also more accustomed to the traditional terms around barriers than the atomic memory ordering terms, so this may reflect my own ignorance!)

Pragmatically,

  • From reading threads here and on the LLVM archives, it looks like LLVM currently preserves relative ordering of atomic and volatile accesses, but I am hesitant to either rely on compiler behavior that may be subject to change, or assume that my backend is LLVM.
  • A number of Orderings given to fence currently produce the instruction I want on my particular target, but that feels fragile, particularly since my target has fewer barrier varieties than, say, PowerPC, so it might be working by accident.

More detailed context: The system I'm working on is an ARM Cortex-M7 based SoC. The M7 has a fairly complex bus interface, and can issue and retire memory accesses out of order if they issue on different bus ports (which, in practice, means that they apply to different coarse-grained sections of physical address space). The architecture-specific thing to do here is to insert a dmb instruction (available in the cortex_m crate, if you are using it, as cortex_m::asm::dmb()). However, the driver in question is for a generic IP block (a Synopsys Ethernet MAC) that is not inherently ARM-specific, so it'd be great to express this portably.

As you have likely inferred, the goal is to wait for the completion of the first write, not its issuance in program order, and so compiler_fence is not useful here.

Any insight would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-supportCategory: Supporting a user to solve a concrete problem

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions