Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions library/core/src/mem/maybe_uninit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,46 @@ use crate::{fmt, intrinsics, ptr, slice};
/// std::process::exit(*code); // UB! Accessing uninitialized memory.
/// }
/// ```
///
/// # Validity
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this discussion here:

The MaybeUninit docs probably make sense for this. We now do have a definition of "byte" in the reference that this can link to.

Okay, awesome. And what wording would you recommend? Would it be accurate to say something like the following?

The value of a [MaybeUninit<u8>; N] may contain pointer provenance, and so p: P -> [MaybeUninit<u8>; N] -> P preserves the value of p, including provenance

@RalfJung would you like me to add language like this to this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I've added the following as a more concrete and fleshed out draft. I can edit or remove as preferred.

/// # Provenance
///
/// `MaybeUninit` values may contain [pointer provenance][provenance]. Concretely, for any
/// pointer type, `P`, which contains provenance, transmuting `p: P` to
/// `MaybeUninit<[u8; size_of::<P>]>` and then back to `P` will produce a value identical to
/// `p`, including provenance.
///
/// [provenance]: ../ptr/index.html#provenance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to reference the definition of a byte.

///
/// A `MaybeUninit<T>` has no validity requirement – any sequence of [bytes][reference-byte] of the
/// appropriate length, initialized or uninitialized, are a valid representation of `MaybeUninit<T>`.
///
/// However, "round-tripping" via `MaybeUninit` does not always result in the original value.
/// `MaybeUninit` can have padding, and the contents of that padding are not preserved.
/// Concretely, given distinct `T` and `U` where `size_of::<T>() == size_of::<U>()`, the following
/// code is not guaranteed to be sound:
///
/// ```rust,no_run
/// # use core::mem::{MaybeUninit, transmute};
/// # struct T; struct U;
/// fn identity(t: T) -> T {
/// unsafe {
/// let u: MaybeUninit<U> = transmute(t);
/// transmute(u)
/// }
/// }
/// ```
///
/// If the representation of `t` contains initialized bytes at byte offsets where `U` contains padding bytes, these
/// may not be preserved in `MaybeUninit<U>`. Transmuting `u` back to `T` (i.e., `transmute(u)` above) may thus
/// be undefined behavior or yield a value different from `t` due to those bytes being lost. This is an active area of discussion, and this code
/// may become sound in the future.
///
/// However, so long as no such byte offsets exist, then the preceding `identity` example *is* sound.
/// In particular, since `[u8; N]` has no padding bytes, transmuting `t` to `MaybeUninit<[u8; size_of::<T>]>`
/// and back will always produce the original value `t` again. This is true even if `t` contains [provenance]:
/// the resulting value will have the same provenance as the original `t`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit of a footgun for potential misunderstandings here but maybe I am being too nitpicky -- and I don't know what else we could say, anyway:

if T is &mut _ or &_, the resulting value will actually not be identical. The transmute itself is a perfect identity, but when a function return a reference (and this includes transmute returning a reference), that can influence the aliasing model. Tree Borrows will consider the resulting reference a child of the original reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the core issue here is how we use the term "value"? E.g. I might say something like: "let y = x produces a y with the same value as x", but it's semantically relevant that there are now two copies of this value instead of one (e.g. image that x: &mut T). So maybe I'm using "value" in a way that doesn't capture everything about the value? More precisely, maybe I'm using "value" in a way that captures everything local about the value, but doesn't capture everything about the relationship between that value and other values (which is relevant when it comes to aliasing). Of course, provenance complicates this story because we talk about provenance "living inside" a particular value, but it's also a non-local property.

That doesn't really resolve your concern, but some random thoughts. Maybe it'll prompt you to think of better language we could use here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility: Instead of saying that the value is fully preserved, maybe we could say that the following contents of the value are preserved?

  • Bit pattern
  • Provenance

...and then explicitly disclaim any other value contents that we add to the AM in the future?

Copy link
Member

@RalfJung RalfJung Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more a problem of abstraction: the Rust code let y = x; becomes something like y = x; Retag(y) in MIR, and that Retag does change the value. Specifically it changes the provenance, making y a child of x rather than wholly identical to x.

(In fact we sometimes even insert reborrows, making this more like y = &mut *x. I don't know the exact conditions for that.)

So maybe I'm using "value" in a way that doesn't capture everything about the value? More precisely, maybe I'm using "value" in a way that captures everything local about the value, but doesn't capture everything about the relationship between that value and other values (which is relevant when it comes to aliasing). Of course, provenance complicates this story because we talk about provenance "living inside" a particular value, but it's also a non-local property.

Alias tracking is based on provenance. Provenance is also just data. It may reference other data, such as indicating a position in a tree.

Copy link
Contributor Author

@joshlf joshlf Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility: Instead of saying that the value is fully preserved, maybe we could say that the following contents of the value are preserved?

  • Bit pattern
  • Provenance

...and then explicitly disclaim any other value contents that we add to the AM in the future?

In that case, I'm leaning towards this option unless you have thoughts about language we could use that captures specifically the subset of "value" that we want to address here. How does that sound?

Copy link
Member

@RalfJung RalfJung Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the best I can come up with so far. But I feel like this may actually be easier to write for someone who's less deeply entrenched in these discussions. ;)

Note: if t contains a reference, then there may be implicit reborrows of the reference any time it is copied, which may alter its provenance. In that case, the value returned by identity may not be exactly the same as its argument. However, even in this case, it remains true that identity behaves the same as a function that just returns t immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay here's my version of this (which is in the PR now):

Note a potential footgun: if t contains a reference, then there may be implicit reborrows of the reference any time it is copied, which may alter its provenance. In that case, while the value returned by identity is exactly the same as its argument, that value may immediately be reborrowed upon return, altering its provenance. This may make this call to identity behave as though it does not exactly preserve provenance.

Copy link
Member

@RalfJung RalfJung Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while the value returned by identity is exactly the same as its argument

That part isn't given (there can be reborrows inside identity).

In fact my comment explicitly stated "the value returned by identity may not be exactly the same as its argument" so not sure how you got from there to your version.

It also seems very useful to state that it is still equivalent to a "boring" identity function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact my comment explicitly stated "the value returned by identity may not be exactly the same as its argument" so not sure how you got from there to your version.

I thought you were being overly-conservative in your wording, which I realize now isn't the case.

Changed to be closer to your wording. Better?

Note a potential footgun: if t contains a reference, then there may be implicit reborrows of the reference any time it is copied, which may alter its provenance. In that case, the value returned by identity may not be exactly the same as its argument. However, even in this case, it remains true that identity behaves the same as a function that just returns t immediately (i.e., fn identity<T>(t: T) -> T { t }).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me :)

///
/// Note a potential footgun: if `t` contains a reference, then there may be implicit reborrows of the reference
/// any time it is copied, which may alter its provenance. In that case, the value returned by `identity` may
/// not be exactly the same as its argument. However, even in this case, it remains true that `identity` behaves
/// the same as a function that just returns `t` immediately (i.e., `fn identity<T>(t: T) -> T { t }`).
///
/// [provenance]: crate::ptr#provenance
///
/// [reference-byte]: ../../reference/memory-model.html#bytes
#[stable(feature = "maybe_uninit", since = "1.36.0")]
// Lang item so we can wrap other types in it. This is useful for coroutines.
#[lang = "maybe_uninit"]
Expand Down
Loading