|
1 | 1 | % Unsafe Code |
2 | 2 |
|
3 | | -# Introduction |
4 | | - |
5 | | -Rust aims to provide safe abstractions over the low-level details of |
6 | | -the CPU and operating system, but sometimes one needs to drop down and |
7 | | -write code at that level. This guide aims to provide an overview of |
8 | | -the dangers and power one gets with Rust's unsafe subset. |
9 | | - |
10 | | -Rust provides an escape hatch in the form of the `unsafe { ... }` |
11 | | -block which allows the programmer to dodge some of the compiler's |
12 | | -checks and do a wide range of operations, such as: |
13 | | - |
14 | | -- dereferencing [raw pointers](#raw-pointers) |
15 | | -- calling a function via FFI ([covered by the FFI guide](ffi.html)) |
16 | | -- casting between types bitwise (`transmute`, aka "reinterpret cast") |
17 | | -- [inline assembly](#inline-assembly) |
18 | | - |
19 | | -Note that an `unsafe` block does not relax the rules about lifetimes |
20 | | -of `&` and the freezing of borrowed data. |
21 | | - |
22 | | -Any use of `unsafe` is the programmer saying "I know more than you" to |
23 | | -the compiler, and, as such, the programmer should be very sure that |
24 | | -they actually do know more about why that piece of code is valid. In |
25 | | -general, one should try to minimize the amount of unsafe code in a |
26 | | -code base; preferably by using the bare minimum `unsafe` blocks to |
27 | | -build safe interfaces. |
28 | | - |
29 | | -> **Note**: the low-level details of the Rust language are still in |
30 | | -> flux, and there is no guarantee of stability or backwards |
31 | | -> compatibility. In particular, there may be changes that do not cause |
32 | | -> compilation errors, but do cause semantic changes (such as invoking |
33 | | -> undefined behaviour). As such, extreme care is required. |
34 | | -
|
35 | | -# Pointers |
36 | | - |
37 | | -## References |
38 | | - |
39 | | -One of Rust's biggest features is memory safety. This is achieved in |
40 | | -part via [the ownership system](ownership.html), which is how the |
41 | | -compiler can guarantee that every `&` reference is always valid, and, |
42 | | -for example, never pointing to freed memory. |
43 | | - |
44 | | -These restrictions on `&` have huge advantages. However, they also |
45 | | -constrain how we can use them. For example, `&` doesn't behave |
46 | | -identically to C's pointers, and so cannot be used for pointers in |
47 | | -foreign function interfaces (FFI). Additionally, both immutable (`&`) |
48 | | -and mutable (`&mut`) references have some aliasing and freezing |
49 | | -guarantees, required for memory safety. |
50 | | - |
51 | | -In particular, if you have an `&T` reference, then the `T` must not be |
52 | | -modified through that reference or any other reference. There are some |
53 | | -standard library types, e.g. `Cell` and `RefCell`, that provide inner |
54 | | -mutability by replacing compile time guarantees with dynamic checks at |
55 | | -runtime. |
56 | | - |
57 | | -An `&mut` reference has a different constraint: when an object has an |
58 | | -`&mut T` pointing into it, then that `&mut` reference must be the only |
59 | | -such usable path to that object in the whole program. That is, an |
60 | | -`&mut` cannot alias with any other references. |
61 | | - |
62 | | -Using `unsafe` code to incorrectly circumvent and violate these |
63 | | -restrictions is undefined behaviour. For example, the following |
64 | | -creates two aliasing `&mut` pointers, and is invalid. |
65 | | - |
66 | | -``` |
67 | | -use std::mem; |
68 | | -let mut x: u8 = 1; |
69 | | -
|
70 | | -let ref_1: &mut u8 = &mut x; |
71 | | -let ref_2: &mut u8 = unsafe { mem::transmute(&mut *ref_1) }; |
72 | | -
|
73 | | -// oops, ref_1 and ref_2 point to the same piece of data (x) and are |
74 | | -// both usable |
75 | | -*ref_1 = 10; |
76 | | -*ref_2 = 20; |
| 3 | +Rust’s main draw is its powerful static guarantees about behavior. But safety |
| 4 | +checks are conservative by nature: there are some programs that are actually |
| 5 | +safe, but the compiler is not able to verify this is true. To write these kinds |
| 6 | +of programs, we need to tell the compiler to relax its restrictions a bit. For |
| 7 | +this, Rust has a keyword, `unsafe`. Code using `unsafe` has less restrictions |
| 8 | +than normal code does. |
| 9 | + |
| 10 | +Let’s go over the syntax, and then we’ll talk semantics. `unsafe` is used in |
| 11 | +two contexts. The first one is to mark a function as unsafe: |
| 12 | + |
| 13 | +```rust |
| 14 | +unsafe fn danger_will_robinson() { |
| 15 | + // scary stuff |
| 16 | +} |
77 | 17 | ``` |
78 | 18 |
|
79 | | -## Raw pointers |
80 | | - |
81 | | -Rust offers two additional pointer types (*raw pointers*), written as |
82 | | -`*const T` and `*mut T`. They're an approximation of C's `const T*` and `T*` |
83 | | -respectively; indeed, one of their most common uses is for FFI, |
84 | | -interfacing with external C libraries. |
85 | | - |
86 | | -Raw pointers have much fewer guarantees than other pointer types |
87 | | -offered by the Rust language and libraries. For example, they |
88 | | - |
89 | | -- are not guaranteed to point to valid memory and are not even |
90 | | - guaranteed to be non-null (unlike both `Box` and `&`); |
91 | | -- do not have any automatic clean-up, unlike `Box`, and so require |
92 | | - manual resource management; |
93 | | -- are plain-old-data, that is, they don't move ownership, again unlike |
94 | | - `Box`, hence the Rust compiler cannot protect against bugs like |
95 | | - use-after-free; |
96 | | -- lack any form of lifetimes, unlike `&`, and so the compiler cannot |
97 | | - reason about dangling pointers; and |
98 | | -- have no guarantees about aliasing or mutability other than mutation |
99 | | - not being allowed directly through a `*const T`. |
100 | | - |
101 | | -Fortunately, they come with a redeeming feature: the weaker guarantees |
102 | | -mean weaker restrictions. The missing restrictions make raw pointers |
103 | | -appropriate as a building block for implementing things like smart |
104 | | -pointers and vectors inside libraries. For example, `*` pointers are |
105 | | -allowed to alias, allowing them to be used to write shared-ownership |
106 | | -types like reference counted and garbage collected pointers, and even |
107 | | -thread-safe shared memory types (`Rc` and the `Arc` types are both |
108 | | -implemented entirely in Rust). |
109 | | - |
110 | | -There are two things that you are required to be careful about |
111 | | -(i.e. require an `unsafe { ... }` block) with raw pointers: |
112 | | - |
113 | | -- dereferencing: they can have any value: so possible results include |
114 | | - a crash, a read of uninitialised memory, a use-after-free, or |
115 | | - reading data as normal. |
116 | | -- pointer arithmetic via the `offset` [intrinsic](#intrinsics) (or |
117 | | - `.offset` method): this intrinsic uses so-called "in-bounds" |
118 | | - arithmetic, that is, it is only defined behaviour if the result is |
119 | | - inside (or one-byte-past-the-end) of the object from which the |
120 | | - original pointer came. |
121 | | - |
122 | | -The latter assumption allows the compiler to optimize more |
123 | | -effectively. As can be seen, actually *creating* a raw pointer is not |
124 | | -unsafe, and neither is converting to an integer. |
125 | | - |
126 | | -### References and raw pointers |
127 | | - |
128 | | -At runtime, a raw pointer `*` and a reference pointing to the same |
129 | | -piece of data have an identical representation. In fact, an `&T` |
130 | | -reference will implicitly coerce to an `*const T` raw pointer in safe code |
131 | | -and similarly for the `mut` variants (both coercions can be performed |
132 | | -explicitly with, respectively, `value as *const T` and `value as *mut T`). |
133 | | - |
134 | | -Going the opposite direction, from `*const` to a reference `&`, is not |
135 | | -safe. A `&T` is always valid, and so, at a minimum, the raw pointer |
136 | | -`*const T` has to point to a valid instance of type `T`. Furthermore, |
137 | | -the resulting pointer must satisfy the aliasing and mutability laws of |
138 | | -references. The compiler assumes these properties are true for any |
139 | | -references, no matter how they are created, and so any conversion from |
140 | | -raw pointers is asserting that they hold. The programmer *must* |
141 | | -guarantee this. |
142 | | - |
143 | | -The recommended method for the conversion is |
| 19 | +All functions called from [FFI][ffi] must be marked as `unsafe`, for example. |
| 20 | +The second use of `unsafe` is an unsafe block: |
144 | 21 |
|
145 | | -``` |
146 | | -let i: u32 = 1; |
147 | | -// explicit cast |
148 | | -let p_imm: *const u32 = &i as *const u32; |
149 | | -let mut m: u32 = 2; |
150 | | -// implicit coercion |
151 | | -let p_mut: *mut u32 = &mut m; |
| 22 | +[ffi]: ffi.html |
152 | 23 |
|
| 24 | +```rust |
153 | 25 | unsafe { |
154 | | - let ref_imm: &u32 = &*p_imm; |
155 | | - let ref_mut: &mut u32 = &mut *p_mut; |
| 26 | + // scary stuff |
156 | 27 | } |
157 | 28 | ``` |
158 | 29 |
|
159 | | -The `&*x` dereferencing style is preferred to using a `transmute`. |
160 | | -The latter is far more powerful than necessary, and the more |
161 | | -restricted operation is harder to use incorrectly; for example, it |
162 | | -requires that `x` is a pointer (unlike `transmute`). |
| 30 | +It’s important to be able to explicitly delineate code that may have bugs that |
| 31 | +cause big problems. If a Rust program segfaults, you can be sure it’s somewhere |
| 32 | +in the sections marked `unsafe`. |
| 33 | + |
| 34 | +# What does ‘safe’ mean? |
| 35 | + |
| 36 | +Safe, in the context of Rust, means “doesn’t do anything unsafe.” Easy! |
| 37 | + |
| 38 | +Okay, let’s try again: what is not safe to do? Here’s a list: |
| 39 | + |
| 40 | +* Data races |
| 41 | +* Dereferencing a null/dangling raw pointer |
| 42 | +* Reads of [undef][undef] (uninitialized) memory |
| 43 | +* Breaking the [pointer aliasing rules][aliasing] with raw pointers. |
| 44 | +* `&mut T` and `&T` follow LLVM’s scoped [noalias][noalias] model, except if |
| 45 | + the `&T` contains an `UnsafeCell<U>`. Unsafe code must not violate these |
| 46 | + aliasing guarantees. |
| 47 | +* Mutating an immutable value/reference without `UnsafeCell<U>` |
| 48 | +* Invoking undefined behavior via compiler intrinsics: |
| 49 | + * Indexing outside of the bounds of an object with `std::ptr::offset` |
| 50 | + (`offset` intrinsic), with |
| 51 | + the exception of one byte past the end which is permitted. |
| 52 | + * Using `std::ptr::copy_nonoverlapping_memory` (`memcpy32`/`memcpy64` |
| 53 | + intrinsics) on overlapping buffers |
| 54 | +* Invalid values in primitive types, even in private fields/locals: |
| 55 | + * Null/dangling references or boxes |
| 56 | + * A value other than `false` (0) or `true` (1) in a `bool` |
| 57 | + * A discriminant in an `enum` not included in its type definition |
| 58 | + * A value in a `char` which is a surrogate or above `char::MAX` |
| 59 | + * Non-UTF-8 byte sequences in a `str` |
| 60 | +* Unwinding into Rust from foreign code or unwinding from Rust into foreign |
| 61 | + code. |
| 62 | + |
| 63 | +[noalias]: http://llvm.org/docs/LangRef.html#noalias |
| 64 | +[undef]: http://llvm.org/docs/LangRef.html#undefined-values |
| 65 | +[aliasing]: http://llvm.org/docs/LangRef.html#pointer-aliasing-rules |
| 66 | + |
| 67 | +Whew! That’s a bunch of stuff. It’s also important to notice all kinds of |
| 68 | +behaviors that are certainly bad, but are expressly _not_ unsafe: |
| 69 | + |
| 70 | +* Deadlocks |
| 71 | +* Reading data from private fields |
| 72 | +* Leaks due to reference count cycles |
| 73 | +* Exiting without calling destructors |
| 74 | +* Sending signals |
| 75 | +* Accessing/modifying the file system |
| 76 | +* Integer overflow |
| 77 | + |
| 78 | +Rust cannot prevent all kinds of software problems. Buggy code can and will be |
| 79 | +written in Rust. These things arne’t great, but they don’t qualify as `unsafe` |
| 80 | +specifically. |
| 81 | + |
| 82 | +# Unsafe Superpowers |
| 83 | + |
| 84 | +In both unsafe functions and unsafe blocks, Rust will let you do three things |
| 85 | +that you normally can not do. Just three. Here they are: |
| 86 | + |
| 87 | +1. Access or update a [static mutable variable][static]. |
| 88 | +2. Dereference a raw pointer. |
| 89 | +3. Call unsafe functions. This is the most powerful ability. |
| 90 | + |
| 91 | +That’s it. It’s important that `unsafe` does not, for example, ‘turn off the |
| 92 | +borrow checker’. Adding `unsafe` to some random Rust code doesn’t change its |
| 93 | +semantics, it won’t just start accepting anything. |
| 94 | + |
| 95 | +But it will let you write things that _do_ break some of the rules. Let’s go |
| 96 | +over these three abilities in order. |
| 97 | + |
| 98 | +## Access or update a `static mut` |
| 99 | + |
| 100 | +Rust has a feature called ‘`static mut`’ which allows for mutable global state. |
| 101 | +Doing so can cause a data race, and as such is inherently not safe. For more |
| 102 | +details, see the [static][static] section of the book. |
| 103 | + |
| 104 | +[static]: static.html |
| 105 | + |
| 106 | +## Dereference a raw pointer |
| 107 | + |
| 108 | +Raw pointers let you do arbitrary pointer arithmetic, and can cause a number of |
| 109 | +different memory safety and security issues. In some senses, the ability to |
| 110 | +dereference an arbitrary pointer is one of the most dangerous things you can |
| 111 | +do. For more on raw pointers, see [their section of the book][rawpointers]. |
| 112 | + |
| 113 | +[rawpointers]: raw-pointers.html |
163 | 114 |
|
| 115 | +## Call unsafe functions |
164 | 116 |
|
| 117 | +This last ability works with both aspects of `unsafe`: you can only call |
| 118 | +functions marked `unsafe` from inside an unsafe block. |
165 | 119 |
|
166 | | -## Making the unsafe safe(r) |
| 120 | +This ability is powerful and varied. Rust exposes some [compiler |
| 121 | +intrinsics][intrinsics] as unsafe functions, and some unsafe functions bypass |
| 122 | +safety checks, trading safety for speed. |
167 | 123 |
|
168 | | -There are various ways to expose a safe interface around some unsafe |
169 | | -code: |
| 124 | +I’ll repeat again: even though you _can_ do arbitrary things in unsafe blocks |
| 125 | +and functions doesn’t mean you should. The compiler will act as though you’re |
| 126 | +upholding its invariants, so be careful! |
170 | 127 |
|
171 | | -- store pointers privately (i.e. not in public fields of public |
172 | | - structs), so that you can see and control all reads and writes to |
173 | | - the pointer in one place. |
174 | | -- use `assert!()` a lot: since you can't rely on the protection of the |
175 | | - compiler & type-system to ensure that your `unsafe` code is correct |
176 | | - at compile-time, use `assert!()` to verify that it is doing the |
177 | | - right thing at run-time. |
178 | | -- implement the `Drop` for resource clean-up via a destructor, and use |
179 | | - RAII (Resource Acquisition Is Initialization). This reduces the need |
180 | | - for any manual memory management by users, and automatically ensures |
181 | | - that clean-up is always run, even when the thread panics. |
182 | | -- ensure that any data stored behind a raw pointer is destroyed at the |
183 | | - appropriate time. |
| 128 | +[intrinsics]: intrinsics.html |
0 commit comments