Skip to content

Conversation

joshlf
Copy link
Contributor

@joshlf joshlf commented Apr 29, 2025

Partially addresses rust-lang/unsafe-code-guidelines#555 by clarifying that it is sound to write any byte values (initialized or uninitialized) to any MaybeUninit<T> regardless of T.

r? @RalfJung

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 29, 2025
/// }
/// ```
///
/// # Validity
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this discussion here:

The MaybeUninit docs probably make sense for this. We now do have a definition of "byte" in the reference that this can link to.

Okay, awesome. And what wording would you recommend? Would it be accurate to say something like the following?

The value of a [MaybeUninit<u8>; N] may contain pointer provenance, and so p: P -> [MaybeUninit<u8>; N] -> P preserves the value of p, including provenance

@RalfJung would you like me to add language like this to this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I've added the following as a more concrete and fleshed out draft. I can edit or remove as preferred.

/// # Provenance
///
/// `MaybeUninit` values may contain [pointer provenance][provenance]. Concretely, for any
/// pointer type, `P`, which contains provenance, transmuting `p: P` to
/// `MaybeUninit<[u8; size_of::<P>]>` and then back to `P` will produce a value identical to
/// `p`, including provenance.
///
/// [provenance]: ../ptr/index.html#provenance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to reference the definition of a byte.

@rust-log-analyzer

This comment has been minimized.

@RalfJung
Copy link
Member

RalfJung commented May 7, 2025

Cc @rust-lang/opsem

Comment on lines 277 to 279
/// If `T` contains initialized bytes at byte offsets where `U` contains padding bytes, these
/// may not be preserved in `MaybeUninit<U>`, and so `transmute(u)` may produce a `T` with
/// uninitialized bytes in these positions. This is an active area of discussion, and this code
/// may become sound in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to say that a type "contains initialized bytes" at some offset. That's a property of a representation.

The typical term for representation bytes that are lost here is "padding". I don't think we have rigorously defined padding anywhere yet, but the term is sufficiently widely-used (and generally with a consistent meaning) that we may just be able to use it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, you're making two points:

  • We should speak about a type's representation containing bytes, not about the type itself containing bytes
  • In a representation, we should speak about padding bytes rather than uninitialized bytes

Is that right?

One thing that's probably worth distinguishing here is between values and layouts. In my mental model, an uninit byte is one of the possible values that a byte can have (e.g., it's the 257th value that can legally appear in a MaybeUninit<u8>). By contrast, padding is a property of a layout - namely, it's a sequence of bytes in a type's layout that happen to have the validity [MaybeUninit<u8>; PADDING_LEN].

Based on this, maybe it's best to say:

If byte offsets exists at which T's representation does not permit uninitialized bytes but U's representation does (e.g. due to padding), then the bytes in T at these offsets may not be preserved in u, and so transmute(u) may produce a T with uninitialized bytes at these offsets. This is an active area of discussion, and this code may become sound in the future.

Copy link
Member

@RalfJung RalfJung May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that right?

No. I think both of the following concepts make sense:

  • The representation of a particular value at a particular type contains uninitialized bytes.
  • A type contains padding bytes. (These are bytes which are always ignored by the representation relation.)

But it makes less sense to talk about padding of a representation, or to talk about uninitialized bytes in a type.

So for this PR, the two key points (and they are separate points) are:

  • If U has padding, those bytes may be reset to "uninitialized" as part of the round-trip. If those same bytes are not padding in T, this can therefore mean some of the information of the original T value is lost.
  • If T does not permit uninitialized bytes on those positions, the round-trip is UB.

The second point is just a logical consequence of the first, it does not add any new information. Not sure if it is worth mentioning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The representation of a particular value at a particular type contains uninitialized bytes.
  • A type contains padding bytes. (These are bytes which are always ignored by the representation relation.)

Does this imply that a type contains padding bytes, not a type's representation?

I'm thinking through the implications of what you said, and I think I understand something new that I didn't before, and I want to run it by you: In my existing mental model, a padding byte is a location in a type's layout such that every byte value at that location (including uninit) is valid (enums complicate this model, but I don't think that complication is relevant for this discussion - we can just stick to thinking about structs). The problem with this mental model is that, interpreted naively, it implies that different byte values in a padding byte could correspond to different logical values of the type. So e.g. in the type #[repr(C)] struct T(u8, u16), [0, 0, 0, 0] and [0, 1, 0, 0] would correspond to different values of the type since we're treating the padding byte itself as part of the representation relation. Of course, that is not something we want.

IIUC, by contrast your model is that the representation relation simply doesn't include padding bytes at all. So it'd be more accurate to describe the representation of T as consisting of three bytes - at offsets 0, 2, and 3. Every representation of T has a "hole" at offset 1 which is not part of the representation. This ensures that there's a 1:1 mapping between logical values and representations. Is that right?

Copy link
Member

@RalfJung RalfJung May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that a type contains padding bytes, not a type's representation?

That's how I think about it. We can't tell which byte is a padding byte by looking at one representation -- it's a property of the type.

In my existing mental model, a padding byte is a location in a type's layout such that every byte value at that location (including uninit) is valid

That would make the only byte of MaybeUninit<u8> a padding byte, so I don't think this is the right definition.
That's why I said above: a padding byte is a byte that is ignored by the representation relation. Slightly more formally: if r is some representation valid for type T, and r' is equal to r everywhere except for padding bytes, then r and r' represent the same value.

So it'd be more accurate to describe the representation of T as consisting of three bytes

The representation has 4 bytes. But only 3 of them actually affect the represented value (which is a tuple of two [mathematical] integers).


We seem to be using the term "representation" slightly differently. For me, that's list a List<Byte> of appropriate length. You may be using that term to refer to what I call "representation relation"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be using the term "representation" slightly differently. For me, that's list a List<Byte> of appropriate length. You may be using that term to refer to what I call "representation relation"?

That's helpful, thank you!

To avoid rabbit holing too much on the definitions (although it's interesting and useful – just maybe a bit of a distraction here), maybe you could propose language you'd prefer to see in place of what I've written here?

@RalfJung
Copy link
Member

@rustbot ready

@RalfJung
Copy link
Member

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 30, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 30, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@RalfJung
Copy link
Member

RalfJung commented Jun 1, 2025

I raised the question on Zulip whether it is wise to make a guarantee here that isn't, strictly speaking, documented in the LLVM LangRef. Nikita says he thinks that that's fine -- we may have to adjust how exactly we compile MaybeUninit in the future, but LLVM currently intends do support this case in a somewhat roundabout and incomplete way that seems to work well enough in practice, and LLVM can't more aggressively exploit the fuzziness along the edges of that approximation until a proper alternative exists.

@karolzwolak
Copy link
Member

Thanks for your contribution @joshlf from wg-triage.
Could you address the comments above?

@joshlf
Copy link
Contributor Author

joshlf commented Aug 27, 2025

Thanks for your contribution @joshlf from wg-triage. Could you address the comments above?

I likely won't have time to move this forward until mid-September or October, but I'll follow up at that point.

@rustbot

This comment has been minimized.

@joshlf
Copy link
Contributor Author

joshlf commented Aug 30, 2025

Thanks for your contribution @joshlf from wg-triage. Could you address the comments above?

I likely won't have time to move this forward until mid-September or October, but I'll follow up at that point.

Nvm, found some time 🙂

I've responded to various comment threads.

@karolzwolak
Copy link
Member

Awesome, you should also rebase your changes onto master, and do @rustbot ready when you're ready for review.

@rustbot

This comment has been minimized.

@joshlf
Copy link
Contributor Author

joshlf commented Aug 31, 2025

@rustbot ready

@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Aug 31, 2025
@traviscross traviscross added I-lang-radar Items that are on lang's radar and will need eventual work or consideration. P-lang-drag-1 Lang team prioritization drag level 1. https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang T-lang Relevant to the language team labels Sep 9, 2025
/// ```
///
/// If the representation of `t` contains initialized bytes at byte offsets where `U` contains padding bytes, these
/// may not be preserved in `MaybeUninit<U>`. Interpreting the representation of `u` at type `T` again (i.e., `transmute(u)` above) may thus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// may not be preserved in `MaybeUninit<U>`. Interpreting the representation of `u` at type `T` again (i.e., `transmute(u)` above) may thus
/// may not be preserved in `MaybeUninit<U>`. Interpreting the representation of `u` as type `T` again (i.e., `transmute(u)` above) may thus

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC "at" is intentional language that @RalfJung uses – it's nomenclature used by language theorists.

Copy link
Member

@RalfJung RalfJung Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"interpret u as type T" sounds wrong. We don't interpret u as a type, it's not a type. We use the type to tell us how to interpret u.

But I am not sure if there is widely-used standard terminology here, "interpret x at type y" just sounded most natural to me.

Copy link
Contributor

@traviscross traviscross Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Reading more closely, to use "as" here I'd instead use "as of", e.g. "interpret the bytes of x as being of the type T". I.e., a value "has a type", "is of a type", or is treated or interpreted as "having" or "being of" a type.

I'm pretty happy to use PLT jargon in general, but for that to work we need to use the piece of jargon pervasively enough that people pick up on it (and probably discuss the jargon in our notation guide or elsewhere). If we just use it in one place, people will just think it's a typo.

Copy link
Member

@RalfJung RalfJung Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even know if this is PLT jargon or just Ralf jargon. ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we instead say this?

Transmuting u to T (i.e., transmute(u) above) may thus...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That also works for me. I'd rather say "Transmuting u back to T" (to preserve the "again" in the current version).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 16, 2025
@traviscross
Copy link
Contributor

We briefly reviewed this is the last lang triage meeting, but it's the sort of thing that's hard to really look through in a meeting, so we deferred it to async review.

@ehuss and I are looking at it now, on the @rust-lang/lang-docs side, as it is Reference-like material, as @RalfJung said. We agreed it'd be good clarify the bit raised in #140463 (comment) regarding provenance and transmute. As discussed in that thread, it'd probably be best to just describe the situation here, in terms of how the surrounding code effects a reborrow, and give an example or demonstration of code that's equivalent in this model.

When that's done, the typo I noted is fixed, and the text is wrapped more consistently, I'll r+ it.

Process-wise, thanks for pinging lang here; that was correct. In fact, let's plan to ping @rust-lang/lang and @rust-lang/lang-docs both on this sort of thing.

@traviscross traviscross removed I-lang-nominated Nominated for discussion during a lang team meeting. P-lang-drag-1 Lang team prioritization drag level 1. https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang labels Sep 16, 2025
Co-authored-by: Ralf Jung <[email protected]>
Edited-by: TC
@rustbot
Copy link
Collaborator

rustbot commented Sep 21, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@traviscross
Copy link
Contributor

traviscross commented Sep 21, 2025

I've revised the text for hopefully better clarity. In particular, as @RalfJung had suggested above, I've stated the guarantees upfront in positive form and then followed with the various caveats. This seemed desirable to me, and will hopefully ease lang review.

@RalfJung, @joshlf, does this look correct? (Did I break anything?)

Comment on lines 262 to 267
/// Using `MaybeUninit` to perform a round trip by transmuting a value of type
/// `T` first to type `MaybeUninit<U>` (where type `U` has the same size as `T`)
/// and then back to type `T` is guaranteed to be sound and to produce the
/// original value with its original [provenance] if and only if no bytes in the
/// representation of the value are initialized at byte offsets where type `U`
/// has padding.
Copy link
Member

@RalfJung RalfJung Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a super long sentence and hard to read. Not sure how much can be done about that, but how about this?

Suggested change
/// Using `MaybeUninit` to perform a round trip by transmuting a value of type
/// `T` first to type `MaybeUninit<U>` (where type `U` has the same size as `T`)
/// and then back to type `T` is guaranteed to be sound and to produce the
/// original value with its original [provenance] if and only if no bytes in the
/// representation of the value are initialized at byte offsets where type `U`
/// has padding.
/// Using `MaybeUninit` to perform a round trip by transmuting a value `t` of type
/// `T` first to type `MaybeUninit<U>` (where type `U` has the same size as `T`)
/// and then back to type `T` is guaranteed to be sound and to produce the
/// original value with its original [provenance] if and only if for all bytes where
/// type `U` has padding, the corresponding byte in the representation of `t`
/// is uninitialized.

Another option would be to try to say "copying a value of type MaybeUninit<T> will exactly preserve the contents of all non-padding bytes of T", and then explain the above as a consequence of that. The trouble here lies in the use of "copying"; what we mean by this is a "typed copy" but that's not a standard term we have established so far, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to try to say "copying a value of type MaybeUninit<T> will exactly preserve the contents of all non-padding bytes of T", and then explain the above as a consequence of that.

Agreed probably this is the right way about it. I'd been tempted to do that originally when revising as this approach does seem cleaner, but had held off for the reason you mention. Maybe let's just do this. I'll make a revision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've revised to do this and to make this a series of shorter sentences. @RalfJung, sound right?

@traviscross traviscross force-pushed the patch-13 branch 2 times, most recently from 591b43e to cde96e5 Compare September 22, 2025 19:12
@traviscross traviscross added I-lang-nominated Nominated for discussion during a lang team meeting. P-lang-drag-1 Lang team prioritization drag level 1. https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 22, 2025
@traviscross
Copy link
Contributor

traviscross commented Sep 22, 2025

In this PR, we're making (or perhaps confirming) two language guarantees:

  • MaybeUninit<T> has no validity requirements –- any sequence of bytes of the appropriate length, initialized or uninitialized, are a valid representation.
  • Moving or copying a value of type MaybeUninit<T> (i.e., performing a "typed copy") will exactly preserve the contents of all non-padding bytes of type T in the value's representation including the provenance of those bytes.

The text of the PR contains these two guarantees and elaborates some implications of them.

@rfcbot fcp merge

@rust-rfcbot
Copy link
Collaborator

rust-rfcbot commented Sep 22, 2025

Team member @traviscross has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns.
See this document for info about what commands tagged team members can give me.

@rust-rfcbot rust-rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Sep 22, 2025
@traviscross traviscross added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 22, 2025
Let's rewrite this for better clarity.  In particular, let's document
our language guarantees upfront and in positive form.  We'll then list
the caveats and the non-guarantees after.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. I-lang-nominated Nominated for discussion during a lang team meeting. I-lang-radar Items that are on lang's radar and will need eventual work or consideration. P-lang-drag-1 Lang team prioritization drag level 1. https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-lang Relevant to the language team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants