Skip to content

Commit 3397223

Browse files
authored
Document the memory management system further (#1442)
1 parent 25f50c9 commit 3397223

File tree

1 file changed

+204
-1
lines changed

1 file changed

+204
-1
lines changed

docs/Memory.md

Lines changed: 204 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -413,7 +413,210 @@ Or `foreach` (works like a foreach loop construct in a lot of other programming
413413
```
414414
415415
416-
## Under the hood
416+
## Under the Hood: The Implementation of Carp's Memory Management System
417+
418+
This section explores the implementation of Carp's memory management system in
419+
greater technical detail. Most users won't need to read this, but if you'd like
420+
to have a deeper understanding of how the system works, you'll find an
421+
explanation in this section.
422+
423+
### AST Info, Identifiers, and Deleters
424+
425+
Like other portions the Carp compiler, the memory management system operates on
426+
the abstract syntax tree (AST) representation of the forms in your program. When
427+
the compiler compiles your code, it assigns addition *information objects*,
428+
called `Info`, to each form in your program; these objects are particularly
429+
important to the memory management system. Among other things, these `Info`
430+
objects contain unique identifiers for each form in your program. The memory
431+
management system uses these identifiers to keep track of memory as it moves
432+
across different parts of your code.
433+
434+
In addition to identifiers, form information objects also contain `Deleters`.
435+
These are a special data structure used to hold information about the `delete`
436+
functions needed for each linear value in your program. One of the memory
437+
management system's main responsibilities is to assign and keep track of these
438+
deleters for each form in your program that makes use of a linear value.
439+
440+
Essentially, as the memory management system examines your code, if it finds a
441+
form that uses a linear value that should be deleted at a certain point, it adds
442+
an appropriate deleter to the info object for the form. If the linear value is
443+
*moved* to some other part of your code, the memory management system will
444+
remove the corresponding deleter, which will be added to the form it's moved
445+
into later.
446+
447+
The key point to understand is that the memory management system primarily
448+
models the movements of linear values using the presence or absence of these
449+
deleter objects. When the compiler's code emission component encounters a form,
450+
if the form has an associated deleter, the emitter will produce a call to the
451+
deletion routine in the corresponding output C code.
452+
453+
As we'll see in a moment, there are some further complications, but this is the
454+
basic approach taken by the memory management system.
455+
456+
### Lifetimes
457+
458+
The basic operation of the memory management system entails moving deleters
459+
across different Info objects for the forms in your program. As the system
460+
performs this task, it also has to account for the way *references* are used
461+
throughout your code, and how they relate to linear values. In order to track
462+
this, the memory management system uses *lifetimes* which determine whether or
463+
not a reference is valid in a given form.
464+
465+
The following function provides an example of this reference validity tracking
466+
in action:
467+
468+
```clojure
469+
(defn invalid-ref []
470+
&[1 2 3])
471+
```
472+
473+
In the prior example, our `invalid-ref` function returns a reference to the
474+
literal linear array value `[1 2 3]`. This code is problematic because the
475+
linear array value will be deleted at the end of the function, so the returned
476+
reference will point to nothing! The memory management system catches this for
477+
us and let's us know about the problem.
478+
479+
Contrarily, the following code is perfectly fine:
480+
481+
```clojure
482+
(def an-array [1 2 3])
483+
484+
(defn valid-ref []
485+
&an-array)
486+
```
487+
488+
The `valid-ref` function also returns a reference, but this reference is valid
489+
since it points to a linear array value (`an-array`) that won't be deleted (it
490+
will still be "alive") by the time the function returns the reference.
491+
492+
The system will also catch cases when we attempt to reference a linear value
493+
that's already been *moved* into a different location/binding:
494+
495+
```clojure
496+
(defn unowned-ref []
497+
(let [a [1 2 3]
498+
b a
499+
c &a]
500+
()))
501+
```
502+
503+
In this example, we move the linear array from `a` into `b`, but then try to set
504+
`c` to a reference to `a`, which, after the move, no longer points to anything.
505+
506+
Internally, the memory management system uses *lifetimes* to model the
507+
relationships between references and linear values and track the validity of
508+
reference across your code.
509+
510+
#### Lifetimes in Detail
511+
512+
Carp's lifetimes are made up of two pieces of information. Only references have
513+
lifetimes, and every reference has *exactly one* lifetime assigned to it:
514+
515+
- A unique type variable that identifies the lifetime.
516+
- A lifetime mode, that indicates if the linear value tied to the reference has
517+
a lexical scope that extends beyond the reference's lexical scope or if it's
518+
limited to the reference's lexical scope.
519+
520+
In general, a reference is valid only when the value it points to has either an
521+
equivalent or greater lexical scope. This property is encoded in its lifetime.
522+
523+
Let's look at some examples to help illustrate this:
524+
525+
```clojure
526+
(def an-array [1 2 3])
527+
528+
(defn valid-ref []
529+
(let [array-ref &an-arry]) ())
530+
```
531+
532+
In this example, the anonymous reference `&an-array` has a unique lifetime that
533+
*extends beyond the lexical scope* of the reference itself. The lexical scope of
534+
the reference value `[1 2 3]` is greater than or equal to the lexical scope of
535+
the reference, which only extends across the let form, so, this reference is
536+
valid.
537+
538+
Contrarily, the following reference is not valid:
539+
540+
```clojure
541+
(defn invalid-ref []
542+
&[1 2 3])
543+
```
544+
545+
Here, the reference has a greater lexical scope than the linear value it points
546+
to. The anonymous linear value `[1 2 3]` will be deleted at the end of the
547+
function scope, but the reference will be returned from the function, so its
548+
lifetime is potentially greater than that of the value it points to.
549+
550+
The memory management system performs two key checks around ref usage:
551+
552+
1. Check that a newly created reference doesn't point to a linear value binding
553+
that has already transferred away ownership.
554+
2. Check that a reference is alive at a certain point in the program.
555+
556+
Both of these are implemented as separate checks, but they may be viewed as
557+
specializations of a general operation that checks if every reference form in
558+
your program is "alive" at the point of use.
559+
560+
Currently, liveness analysis revolves around checking if the value the reference
561+
points to belongs to the same lexical scope as the reference, and, if so, that
562+
the value has a deleter in that scope, which indicates the scope properly owns
563+
the value. If no such deleter exists, it means the reference outlives the value
564+
it points to, and is invalid.
565+
566+
### Type Dependencies
567+
568+
The final key piece of information the memory system manages are the *type
569+
dependencies* of the deletion functions for linear values.
570+
571+
Since Carp supports generic programming and polymorphic functions, it's possible
572+
that some deleter is needed in a polymorphic context. In particular, generic
573+
functions that "take ownership" of generic values need to be able to find the
574+
correct deletion routines for the value. For example, in the generic function:
575+
576+
```clojure
577+
(sig my-generic-force-delete (Fn [a] Unit))
578+
579+
(defn my-generic-force-delete [a]
580+
())
581+
```
582+
583+
This `my-generic-force-delete` function takes ownership of whatever argument it
584+
receives and does nothing. Since it takes ownership, however, the value passed
585+
to `a`, if it's linear, needs to be deleted at the end of the function scope.
586+
587+
Since the function is generic, the memory management system can't know for
588+
certain what value is being passed. In some cases it might be a linear value, in
589+
some cases it might not be. Sometimes it might be a `String`, sometimes an
590+
`Int`, or sometimes an `Array`. Each of these types has a different `delete`
591+
implementation.
592+
593+
Rather than having the memory management system figure out what function to use,
594+
the system instead just keeps track of the types of all the values for the forms
595+
it analyzes. Later, the component already dedicated to resolving generic
596+
functions handles finding the right deletion routine for the values passed to
597+
the generic function. In order to accomplish this, it uses the type information
598+
captured by the memory system as it analyzes each form.
599+
600+
### Memory State
601+
602+
As we've explored, the memory management system needs to keep track of three key
603+
pieces of information as it analyzes the forms in your program:
604+
605+
1. The deleters assigned to each AST node to track ownership of linear values
606+
and delete them at the right time.
607+
2. The Lifetimes assigned to each reference to check reference validity.
608+
3. The types of each form it analyzes to resolve generic deletion functions.
609+
610+
Each of these units of information is bundled into a single data structure,
611+
called the *memory state* or `MemState` of your program.
612+
613+
As the memory management system analyzes each of the AST nodes in your program
614+
source, it updates the memory state accordingly. Deleters are added and removed
615+
from the state at different points as ownership transfers of linear values
616+
occur. When the system finishes analyzing a node, it update's the node's `Info`
617+
object, attaching the deleters associated with the current memory state. At any
618+
point, if the memory management system encounters a problem with the way memory
619+
is being transferred across your program's AST nodes, it reports an error.
417620
418621
A simple piece of code:
419622

0 commit comments

Comments
 (0)