Skip to content

Commit 394a2f4

Browse files
committed
New Section - Type Layout
1 parent 4b49378 commit 394a2f4

File tree

7 files changed

+294
-23
lines changed

7 files changed

+294
-23
lines changed

src/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@
6161
- [Type system](type-system.md)
6262
- [Types](types.md)
6363
- [Dynamically Sized Types](dynamically-sized-types.md)
64+
- [Type layout](type-layout.md)
6465
- [Interior mutability](interior-mutability.md)
6566
- [Subtyping](subtyping.md)
6667
- [Type coercions](type-coercions.md)

src/attributes.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -357,7 +357,7 @@ pub mod m3 {
357357
}
358358
```
359359

360-
### Inline attributes
360+
### Inline attribute
361361

362362
The inline attribute suggests that the compiler should place a copy of
363363
the function or static in the caller, rather than generating code to

src/dynamically-sized-types.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Most types have a fixed size that is known at compile time and implement the
44
trait [`Sized`][sized]. A type with a size that is known only at run-time is
5-
called a _dynamically sized type_ (_DST_) or (informally) an unsized type.
5+
called a _dynamically sized type_ (_DST_) or, informally, an unsized type.
66
[Slices] and [trait objects] are two examples of <abbr title="dynamically sized
77
types">DSTs</abbr>. Such types can only be used in certain cases:
88

src/glossary.md

+10
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@
55
An ‘abstract syntax tree’, or ‘AST’, is an intermediate representation of
66
the structure of the program when the compiler is compiling it.
77

8+
### Alignment
9+
10+
The *alignment* of a value specifies what addresses are valid to store the value
11+
at.
12+
813
### Arity
914

1015
Arity refers to the number of arguments a function or operation takes.
@@ -57,6 +62,11 @@ can create such an lvalue without initializing it.
5762
Prelude, or The Rust Prelude, is a small collection of items - mostly traits - that are
5863
imported into very module of every crate. The traits in the prelude are pervasive.
5964

65+
### Size
66+
67+
The *size* of a value is the offset in bytes between successive elements in an
68+
array with that item type including alignment padding.
69+
6070
### Slice
6171

6272
A slice is dynamically-sized view into a contiguous sequence, written as `[T]`.

src/items/enumerations.md

+33-19
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,6 @@ An _enumeration_ is a simultaneous definition of a nominal [enumerated type] as
44
well as a set of *constructors*, that can be used to create or pattern-match
55
values of the corresponding enumerated type.
66

7-
[enumerated type]: types.html#enumerated-types
8-
97
Enumerations are declared with the keyword `enum`.
108

119
An example of an `enum` item and its use:
@@ -24,7 +22,7 @@ Enumeration constructors can have either named or unnamed fields:
2422

2523
```rust
2624
enum Animal {
27-
Dog (String, f64),
25+
Dog(String, f64),
2826
Cat { name: String, weight: f64 },
2927
}
3028

@@ -34,36 +32,52 @@ a = Animal::Cat { name: "Spotty".to_string(), weight: 2.7 };
3432

3533
In this example, `Cat` is a _struct-like enum variant_, whereas `Dog` is simply
3634
called an enum variant. Each enum instance has a _discriminant_ which is an
37-
integer associated to it that is used to determine which variant it holds.
35+
integer associated to it that is used to determine which variant it holds. An
36+
opaque reference to this variant can be obtained with the [`mem::discriminant`]
37+
function.
3838

3939
## C-like Enumerations
4040

41-
If there is no data attached to *any* of the variants of an enumeration it is
42-
called a *c-like enumeration*. If a discriminant isn't specified, they start at
43-
zero, and add one for each variant, in order. Each enum value is just its
44-
discriminant which you can specify explicitly:
41+
If there is no data attached to *any* of the variants of an enumeration and
42+
there is at least one variant then it is called a *c-like enumeration*.
43+
44+
C-like enumerations can be cast to integer types with the `as` operator by a
45+
[numeric cast]. The enumeration can optionaly specify which integer each
46+
discriminant gets by following the variant name with `=` and then an integer
47+
literal. If the first variant in the declaration is unspecified, then it is set
48+
to zero. For every unspecified discriminant, it is set to one higher than the
49+
previous variant in the declaration.
4550

4651
```rust
4752
enum Foo {
4853
Bar, // 0
49-
Baz = 123,
54+
Baz = 123, // 123
5055
Quux, // 124
5156
}
57+
58+
let baz_discriminant = Foo::Baz as u32;
59+
assert_eq!(baz_discriminant, 123u32);
5260
```
5361

54-
The right hand side of the specification is interpreted as an `isize` value,
55-
but the compiler is allowed to use a smaller type in the actual memory layout.
56-
The [`repr` attribute] can be added in order to change the type of the right
57-
hand side and specify the memory layout.
62+
Under the [default representation], the specified discriminant is interpreted as
63+
an `isize` value although the compiler is allowed to use a smaller type in the
64+
actual memory layout. The size and thus acceptable values can be changed by
65+
using a [primitive representation] or the [`C` representation].
66+
67+
It is an error when either two variants share the same discriminant or for an
68+
unspecified discriminant, the previous discriminant is the maximum value for the
69+
size of the discriminant. <!-- Need examples here. -->
5870

59-
[`repr` attribute]: attributes.html#ffi-attributes
71+
## Zero-variant Enumerations
6072

61-
You can also cast a c-like enum to get its discriminant:
73+
Enums with zero variants are known as *zero-variant enumerations*. As they have
74+
no valid values, they cannot be instantiated.
6275

6376
```rust
64-
# enum Foo { Baz = 123 }
65-
let x = Foo::Baz as u32; // x is now 123u32
77+
enum ZeroVariants {}
6678
```
6779

68-
This only works as long as none of the variants have data attached. If it were
69-
`Baz(i32)`, this is disallowed.
80+
[enumerated type]: types.html#enumerated-types
81+
[`mem::discriminant`]: std/mem/fn.discriminant.html
82+
[numeric cast]: expressions/operator-expr.html#semantics
83+
[`repr` attribute]: attributes.html#ffi-attributes

src/type-layout.md

+246
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
# Type Layout
2+
3+
The layout of a type is the way the size, alignment, and the offsets of any
4+
fields and discriminants for the values of that type.
5+
6+
**PR NOTE: This doesn't include valid values. E.g. `bool` and `i8` have the
7+
same layout under this definition. Nor does it include calling convention
8+
differences, so `u8` and `#[repr(C)] struct S { f: u8 }` have the same layout,
9+
as does `*T` and `&T`. I'm not sure if it should or not.**
10+
11+
While specific releases of the compiler will have the same layout for types,
12+
there is a lot of room for new versions of the compiler to do different things.
13+
Instead of trying to document exactly what is done, we only document what is
14+
guaranteed today.
15+
16+
## Size and Alignment
17+
18+
All values have an alignment and size.
19+
20+
The *alignment* of a value specifies what addresses are valid to store the value
21+
at. A value of alignment `n` must only be stored at an address that is a
22+
multiple of n. For example, a value with an alignment of 2 must be stored at an
23+
even address, while a value with an alignment of 1 can be stored at any address.
24+
Alignment is measured in bytes, and must be at least 1, and always a power of 2.
25+
The alignment of a value can be checked with the [`align_of_val`] function.
26+
27+
The *size* of a value is the offset in bytes between successive elements in an
28+
array with that item type including alignment padding. The size of a value is
29+
always a multiple of its alignment. The size of a value can be checked with the
30+
[`size_of_val`] function.
31+
32+
Types where all values have the same size and alignment known at compile time
33+
implement the [`Sized`] trait and can be checked with the [`size_of`] and
34+
[`align_of`] functions. Types that are not [`Sized`] are known as [dynamically
35+
sized types]. Since all values of a `Sized` type share the same size and
36+
alignment, we refer to those shared values as the size of the type and the
37+
alignment of the type respectively.
38+
39+
## Primitive Data Layout
40+
41+
The size of most primitives is given in this table.
42+
43+
Type | `size_of::\<Type>()`
44+
- | - | -
45+
bool | 1
46+
u8 | 1
47+
u16 | 2
48+
u32 | 4
49+
u64 | 8
50+
i8 | 1
51+
i16 | 2
52+
i32 | 4
53+
i64 | 8
54+
f32 | 4
55+
f64 | 8
56+
char | 4
57+
58+
`usize` and `isize` have a size big enough to contain every address on the
59+
target platform. For example, on a 32 bit target, this is 4 bytes and on a 64
60+
bit target, this is 8 bytes.
61+
62+
Most primitives are generally aligned to their size, although this is
63+
platform-specific behavior. In particular, on x86 u64 and f64 may be only
64+
aligned to 32 bits.
65+
66+
## Pointers and References Layout
67+
68+
Pointers and references have the same layout. Mutability of the pointer or
69+
reference does not change the layout.
70+
71+
Pointers to sized types have the same size and alignment as `usize`.
72+
73+
Pointers to unsized types are sized. The size and alignemnt is guaranteed to be
74+
at least equal to the size and alignment of a pointer.
75+
76+
> Note: Though you should not rely on this, all pointers to <abbr
77+
> title="Dynamically Sized Types">DSTs</abbr> are currently twice the size of
78+
> the size of `usize` and have the same alignment.
79+
80+
## Array Layout
81+
82+
Arrays are laid out so that the `nth` element of the array is offset from the
83+
start of the array by `n * the size of the type` bytes. An array of `[T; n]`
84+
has a size of `size_of::<T>() * n` and the same alignment of `T`.
85+
86+
## Slice Layout
87+
88+
Slices have the same layout as the section of the array they slice.
89+
90+
## Tuple Layout
91+
92+
Tuples do not have any guarantes about their layout.
93+
94+
The exception to this is the unit tuple (`()`) which is guaranteed as a
95+
zero-sized type to have a size of 0 and an alignment of 1.
96+
97+
## Trait Object Layout
98+
99+
Trait objects have the same layout as the value the trait object is of.
100+
101+
## Closure Layout
102+
103+
Closures have no layout guarantees.
104+
105+
## Representations
106+
107+
All **FIXME** types have a *representation* that specifies what the layout
108+
is for the type.
109+
110+
Note: The representation does not depend upon the type's fields or generic
111+
parameters.
112+
113+
The possible representations for a type are the default representation, `C`, the
114+
primitive representations, and `packed`. Multiple representations can be applied
115+
to a single type.
116+
117+
The representation of a type can be changed by applying the [`repr` attribute]
118+
to it. The following example shows a struct with a `C` representation.
119+
120+
```
121+
#[repr(C)]
122+
struct ThreeInts {
123+
first: i16,
124+
second: i8,
125+
third: i32
126+
}
127+
```
128+
129+
The representation of a type does not change the layout of its fields. For
130+
example, a struct with a `C` representation that contains a struct `Inner` with
131+
the default representation will not change the layout of Inner.
132+
133+
### The Default Representation
134+
135+
Nominal types without a `repr` attribute have the default representation.
136+
Informally, this representation is also called the `rust` representation.
137+
138+
There are no guarantees of data layout made by this representation.
139+
140+
### The `C` Representation
141+
142+
The `C` representation is designed for creating types that are interoptable with
143+
the C Language and soundly performing operations that rely on data layout such
144+
as reinterpreting values as a different type.
145+
146+
This representation can be applied to structs, unions, and enums.
147+
148+
#### \#[repr(C)] Structs
149+
150+
The alignment of the struct is the alignment of the most-aligned field in it.
151+
152+
The size and offset of fields is determine by the following algorithm.
153+
154+
Start with a current offset of 0 bytes.
155+
156+
For each field in declaration order in the struct, first determine the size and
157+
alignment of the field. If the current offset is not a multiple of the field's
158+
alignment, then add padding bytes increasing the current offset until the
159+
current offset is a multiple of the field's alignment. The offset for the field
160+
is what the current offset is now. Then increase the current offset by the size
161+
of the field.
162+
163+
Finally, the size of the struct is the current offset rounded up to the nearest
164+
multiple of the struct's alignment.
165+
166+
> Note: You can have zero-sized structs from this algorithm. This differs from
167+
> C where structs without data still have a size of one byte.
168+
169+
#### \#[repr(C)] Unions
170+
171+
A union declared with `#[repr(C)]` will have the same size and alignment as an
172+
equivalent C union declaration in the C language for the target platform.
173+
Usually, a union would have the maximum size of the maximum size of all of its
174+
fields, and the maximum alignment of the maximum alignment of all of its fields.
175+
These maximums may come from different fields.
176+
177+
```
178+
#[repr(C)]
179+
union Union {
180+
f1: u16,
181+
f2: [u8; 4],
182+
}
183+
184+
assert_eq!(std::mem::size_o::<Union>(), 4); // From f2
185+
assert_eq!(std::mem::align_of::<Union>(), 2); // From f1
186+
```
187+
188+
#### \#[repr(C)] Enums
189+
190+
For [C-like enumerations], the `C` representation has the size and alignment of
191+
the default `enum` size and alignment for the target platform's C ABI.
192+
193+
> Note: The enum representation in C is implementation defined, so this is
194+
> really a "best guess". In particular, this may be incorrect when the C code
195+
> of interest is compiled with certain flags.
196+
197+
> Warning: There are crucial differences between an `enum` in the C language and
198+
> Rust's C-like enumerations with this representation. An `enum` in C is
199+
> mostly a `typedef` plus some named constants; in other words, an object of an
200+
> `enum` type can hold any integer value. For example, this is often used for
201+
> bitflags in `C`. In contrast, Rust’s C-like enumerations can only legally hold
202+
> the discrimnant values, everything else is undefined behaviour. Therefore,
203+
> using a C-like enumeration in FFI to model a C `enum` is often wrong.
204+
205+
It is an error for [zero-variant enumerations] to have the `C` representation.
206+
207+
For all other enumerations, the layout is unspecified.
208+
209+
### Primitive representations
210+
211+
The *primitive representations* are the representations with the same names as
212+
the primitive integer types. That is: `u8`, `u16`, `u32`, `u64`, `usize`, `i8`,
213+
`i16`, `i32`, `i64`, and `isize`.
214+
215+
Primitive representations can only be applied to enumerations.
216+
217+
For [C-like enumerations], they set the size and alignment to be the same as the
218+
primitive type of the same name. For example, a C-like enumeration with a `u8`
219+
representation can only have discriminants between 0 and 255 inclusive.
220+
221+
It is an error for [zero-variant enumerations] to have a primitive
222+
representation.
223+
224+
For all other enumerations, the layout is unspecified.
225+
226+
### The `packed` Representation
227+
228+
The `packed` representation can only be used on `struct`s and `union`s.
229+
230+
It modifies the representation (either the default or `C`) by removing any
231+
padding bytes and forcing the alignment of the type to `1`.
232+
233+
> Warning: Dereferencing an unaligned pointer is [undefined behaviour] and is
234+
> possible to [safely create unaligned pointers to `packed` fields][27060].
235+
> Like all ways to create undefined behavior in safe Rust, this is a bug.
236+
237+
[`align_of_val`]: ../std/mem/fn.align_of_val.html
238+
[`size_of_val`]: ../std/mem/fn.size_of_val.html
239+
[`align_of`]: ../std/mem/fn.align_of.html
240+
[`size_of`]: ../std/mem/fn.size_of.html
241+
[`Sized`]: ../std/marker/trait.Sized.html
242+
[dynamically sized types]: dynamically-sized-types.html
243+
[C-like enumerations]: items/enumerations.html#c-like-enumerations
244+
[zero-variant enumerations]: items/enumerations.html#zero-variant-enumerations
245+
[undefined behavior]: behavior-considered-undefined.html
246+
[27060]: https://github.com/rust-lang/rust/issues/27060

src/types.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -146,8 +146,8 @@ let slice: &[i32] = &boxed_array[..];
146146
All elements of arrays and slices are always initialized, and access to an
147147
array or slice is always bounds-checked in safe methods and operators.
148148

149-
The [`Vec<T>`] standard library type provides a heap allocated resizable array
150-
type.
149+
> Note: The [`Vec<T>`] standard library type provides a heap allocated resizable
150+
> array type.
151151
152152
[dynamically sized type]: dynamically-sized-types.html
153153
[`Vec<T>`]: ../std/vec/struct.Vec.html

0 commit comments

Comments
 (0)