Skip to content

[Docs] Expand the section about pointer aliasing in TRPL: 4.2. Unsafe Code #21159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 112 additions & 24 deletions src/doc/trpl/unsafe.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,32 +49,15 @@ and mutable (`&mut`) references have some aliasing and freezing
guarantees, required for memory safety.

In particular, if you have an `&T` reference, then the `T` must not be
modified through that reference or any other reference. There are some
standard library types, e.g. `Cell` and `RefCell`, that provide inner
mutability by replacing compile time guarantees with dynamic checks at
runtime.
modified through that reference or any other reference or pointer and must not
alias with any `&mut` reference. There are some standard library types, e.g.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distinguishing between pointers and references seems a bit strange and overly verbose. It seems like we should just use a single word to refer to all pointer types. The language reference refers to them as 'pointers', but I've noticed in the past that some people in the Rust community dislike the P-word.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can adjust the text to use the term "reference" for &T/&mut T, "raw pointer" for *const T/*mut T and "pointer" for both references and raw pointers. Would it be ok?

`Cell` and `RefCell`, that provide inner mutability by replacing compile time
guarantees with dynamic checks at runtime.

An `&mut` reference has a different constraint: when an object has an
`&mut T` pointing into it, then that `&mut` reference must be the only
such usable path to that object in the whole program. That is, an
`&mut` cannot alias with any other references.

Using `unsafe` code to incorrectly circumvent and violate these
restrictions is undefined behaviour. For example, the following
creates two aliasing `&mut` pointers, and is invalid.

```
use std::mem;
let mut x: u8 = 1;

let ref_1: &mut u8 = &mut x;
let ref_2: &mut u8 = unsafe { mem::transmute(&mut *ref_1) };

// oops, ref_1 and ref_2 point to the same piece of data (x) and are
// both usable
*ref_1 = 10;
*ref_2 = 20;
```
`&mut T` pointing into it, then it can be modified only through this reference
and not through any other reference or pointer. Moreover, an
`&mut T` must not alias with any other `&` or `&mut` reference.

## Raw pointers

Expand Down Expand Up @@ -127,6 +110,111 @@ The latter assumption allows the compiler to optimize more
effectively. As can be seen, actually *creating* a raw pointer is not
unsafe, and neither is converting to an integer.

## Reference and pointer aliasing

Several examples can give a better comprehension of the reference and pointer
aliasing rules.

First of all, using `unsafe` code to circumvent the restrictions on references
leads to undefined behaviour. For example, the following code creates two
aliasing `&mut` references, and is therefore illegal.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this code from libcore/cell.rs violate the rules?

#[unstable = "waiting for `DerefMut` to become stable"]
impl<'b, T> DerefMut<T> for RefMut<'b, T> {
    #[inline]
    fn deref_mut<'a>(&'a mut self) -> &'a mut T {
        unsafe { &mut *self._parent.value.get() }
    }
}

It's using unsafe code to create an aliasing &mut reference of a different type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was removed in this commit 5048953
But I suppose this code was correct, because all references pointed to UnsafeCell and UnsafeCell is special.

The statement "if you create two aliasing &mut references, then it's UB" was taken from the old text, I was conservative and kept it. But it may be too restrictiveve, for example, the alternative (better?) wording could be "it's not UB when you simply create two aliasing &mut references, UB happens when you start writing/reading through them to/from something that is not UnsafeCell".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why I had an old branch set to my current Rust HEAD, but the code is still there there without the stability comment in 6d8342f:

impl<'b, T> DerefMut<T> for RefMut<'b, T> {
    #[inline]
    fn deref_mut<'a>(&'a mut self) -> &'a mut T {
        unsafe { &mut *self._parent.value.get() }
    }
}


```
use std::mem;
let mut x: u8 = 1;

let ref_1: &mut u8 = &mut x;
let ref_2: &mut u8 = unsafe { mem::transmute(&mut *ref_1) };

// oops, ref_1 and ref_2 point to the same piece of data (x) and are
// both usable
*ref_1 = 10;
*ref_2 = 20;
```

When raw pointers come into play, the situation becomes more complex.
Raw pointers can alias with anything, but at the same time they should respect
the reference aliasing rules, so it's forbidden to write through a raw pointer,
when the memory location it points to is borrowed by a safe Rust reference.

```
// Technically this function invokes undefined behavior and can do anything,
// but usually it prints 1, 2, 2 and one more number.
// The fourth number can be, for example, 2 in optimized build and 3 in
// unoptimized build.
unsafe fn f(p: *mut i32, r: &mut i32) {
let mut val1 = *p;
println!("{}", val1);
*r = 2; // This value change should be visible through the pointer p.
val1 = *p; // This load operation cannot be optimized out.
println!("{}", val1);

let mut val2 = *r;
println!("{}", val2);
// This value change is illegal and may be invisible through the reference r.
*p = 3;
// This load operation can be optimized out based on the assumption of
// uniqueness of the reference r.
val2 = *r;
println!("{}", val2);
}

fn main() {
let mut val = 1i32;
unsafe { f(&mut val, &mut val) };
}
```

So, raw pointers can be relatively safely used as observers, but an extreme
care should be taken when performing write operations through them. You should
be sure that no one else refers to the modified value by `&T` or `&mut T`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relatively safely? This just leaves the reader wondering which rules you have left out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, these raw pointers still can be null or point to garbage, they are not completely safe. That's just not related to aliasing.


The structure `UnsafeCell` is special-cased in the compiler to allow the
language to work correctly with interior mutability. A value of `UnsafeCell`
can always be safely read through a reference, even if it is updated through
something else.

```
use std::cell::UnsafeCell;

// This function should print 1 and 2.
// If UnsafeCell is replaced with equivalent user-defined structure, then the
// second printed number is undefined, for example, the function can print 2 in
// unoptimized build and 1 in optimized build.
unsafe fn f_cell(p: *mut i32, r: &UnsafeCell<i32>) {
let mut val = r.value;
println!("{}", val);
*p = 2; // This value change should be visible through the reference r.
val = r.value; // This load operation cannot be optimized out.
println!("{}", val);
}

fn main() {
let mut val_cell = UnsafeCell { value: 1i32 };
unsafe { f_cell(&mut val_cell.value, &mut val_cell) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When not done in statics, the more conventional way to construct an UnsafeCell is via UnsafeCell::new and using the .get() method to get out a raw pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this piece of code is not conventional, UnsafeCell is explained here, not used :)
So I pursue the maximum transparency and ability to quickly substitute UnsafeCell with a user-defined structure to see the difference in behavior.

}
```

Finally, Rust doesn't have any kind of type based aliasing rules for raw
pointers like C, e.g. pointers to unrelated types (e.g. `i32` and `f32`) may
alias.

```
// This program should print 1 and 0.
unsafe fn f(pi: *mut i32, pf: *mut f32) {
let mut val = *pi;
println!("{}", val);
*pf = 0.0; // This value change should be visible through the pointer pi.
val = *pi; // This load operation cannot be optimized out.
println!("{}", val);
}

fn main() {
let mut val = 1i32;
unsafe { f(&mut val, &mut val as *mut i32 as *mut f32) };
}
```

### References and raw pointers

At runtime, a raw pointer `*` and a reference pointing to the same
Expand Down