@@ -413,7 +413,210 @@ Or `foreach` (works like a foreach loop construct in a lot of other programming
413413```
414414
415415
416- ## Under the hood
416+ ## Under the Hood: The Implementation of Carp's Memory Management System
417+
418+ This section explores the implementation of Carp's memory management system in
419+ greater technical detail. Most users won't need to read this, but if you'd like
420+ to have a deeper understanding of how the system works, you'll find an
421+ explanation in this section.
422+
423+ ### AST Info, Identifiers, and Deleters
424+
425+ Like other portions the Carp compiler, the memory management system operates on
426+ the abstract syntax tree (AST) representation of the forms in your program. When
427+ the compiler compiles your code, it assigns addition *information objects*,
428+ called `Info`, to each form in your program; these objects are particularly
429+ important to the memory management system. Among other things, these `Info`
430+ objects contain unique identifiers for each form in your program. The memory
431+ management system uses these identifiers to keep track of memory as it moves
432+ across different parts of your code.
433+
434+ In addition to identifiers, form information objects also contain `Deleters`.
435+ These are a special data structure used to hold information about the `delete`
436+ functions needed for each linear value in your program. One of the memory
437+ management system's main responsibilities is to assign and keep track of these
438+ deleters for each form in your program that makes use of a linear value.
439+
440+ Essentially, as the memory management system examines your code, if it finds a
441+ form that uses a linear value that should be deleted at a certain point, it adds
442+ an appropriate deleter to the info object for the form. If the linear value is
443+ *moved* to some other part of your code, the memory management system will
444+ remove the corresponding deleter, which will be added to the form it's moved
445+ into later.
446+
447+ The key point to understand is that the memory management system primarily
448+ models the movements of linear values using the presence or absence of these
449+ deleter objects. When the compiler's code emission component encounters a form,
450+ if the form has an associated deleter, the emitter will produce a call to the
451+ deletion routine in the corresponding output C code.
452+
453+ As we'll see in a moment, there are some further complications, but this is the
454+ basic approach taken by the memory management system.
455+
456+ ### Lifetimes
457+
458+ The basic operation of the memory management system entails moving deleters
459+ across different Info objects for the forms in your program. As the system
460+ performs this task, it also has to account for the way *references* are used
461+ throughout your code, and how they relate to linear values. In order to track
462+ this, the memory management system uses *lifetimes* which determine whether or
463+ not a reference is valid in a given form.
464+
465+ The following function provides an example of this reference validity tracking
466+ in action:
467+
468+ ```clojure
469+ (defn invalid-ref []
470+ &[1 2 3])
471+ ```
472+
473+ In the prior example, our `invalid-ref` function returns a reference to the
474+ literal linear array value `[1 2 3]`. This code is problematic because the
475+ linear array value will be deleted at the end of the function, so the returned
476+ reference will point to nothing! The memory management system catches this for
477+ us and let's us know about the problem.
478+
479+ Contrarily, the following code is perfectly fine:
480+
481+ ```clojure
482+ (def an-array [1 2 3])
483+
484+ (defn valid-ref []
485+ &an-array)
486+ ```
487+
488+ The `valid-ref` function also returns a reference, but this reference is valid
489+ since it points to a linear array value (`an-array`) that won't be deleted (it
490+ will still be " alive" ) by the time the function returns the reference.
491+
492+ The system will also catch cases when we attempt to reference a linear value
493+ that's already been *moved* into a different location/binding:
494+
495+ ```clojure
496+ (defn unowned-ref []
497+ (let [a [1 2 3]
498+ b a
499+ c &a]
500+ ()))
501+ ```
502+
503+ In this example, we move the linear array from `a` into `b`, but then try to set
504+ `c` to a reference to `a`, which, after the move, no longer points to anything.
505+
506+ Internally, the memory management system uses *lifetimes* to model the
507+ relationships between references and linear values and track the validity of
508+ reference across your code.
509+
510+ #### Lifetimes in Detail
511+
512+ Carp's lifetimes are made up of two pieces of information. Only references have
513+ lifetimes, and every reference has *exactly one* lifetime assigned to it:
514+
515+ - A unique type variable that identifies the lifetime.
516+ - A lifetime mode, that indicates if the linear value tied to the reference has
517+ a lexical scope that extends beyond the reference's lexical scope or if it's
518+ limited to the reference's lexical scope.
519+
520+ In general, a reference is valid only when the value it points to has either an
521+ equivalent or greater lexical scope. This property is encoded in its lifetime.
522+
523+ Let's look at some examples to help illustrate this:
524+
525+ ```clojure
526+ (def an-array [1 2 3])
527+
528+ (defn valid-ref []
529+ (let [array-ref &an-arry]) ())
530+ ```
531+
532+ In this example, the anonymous reference `&an-array` has a unique lifetime that
533+ *extends beyond the lexical scope* of the reference itself. The lexical scope of
534+ the reference value `[1 2 3]` is greater than or equal to the lexical scope of
535+ the reference, which only extends across the let form, so, this reference is
536+ valid.
537+
538+ Contrarily, the following reference is not valid:
539+
540+ ```clojure
541+ (defn invalid-ref []
542+ &[1 2 3])
543+ ```
544+
545+ Here, the reference has a greater lexical scope than the linear value it points
546+ to. The anonymous linear value `[1 2 3]` will be deleted at the end of the
547+ function scope, but the reference will be returned from the function, so its
548+ lifetime is potentially greater than that of the value it points to.
549+
550+ The memory management system performs two key checks around ref usage:
551+
552+ 1. Check that a newly created reference doesn't point to a linear value binding
553+ that has already transferred away ownership.
554+ 2. Check that a reference is alive at a certain point in the program.
555+
556+ Both of these are implemented as separate checks, but they may be viewed as
557+ specializations of a general operation that checks if every reference form in
558+ your program is " alive" at the point of use.
559+
560+ Currently, liveness analysis revolves around checking if the value the reference
561+ points to belongs to the same lexical scope as the reference, and, if so, that
562+ the value has a deleter in that scope, which indicates the scope properly owns
563+ the value. If no such deleter exists, it means the reference outlives the value
564+ it points to, and is invalid.
565+
566+ ### Type Dependencies
567+
568+ The final key piece of information the memory system manages are the *type
569+ dependencies* of the deletion functions for linear values.
570+
571+ Since Carp supports generic programming and polymorphic functions, it's possible
572+ that some deleter is needed in a polymorphic context. In particular, generic
573+ functions that " take ownership" of generic values need to be able to find the
574+ correct deletion routines for the value. For example, in the generic function:
575+
576+ ```clojure
577+ (sig my-generic-force-delete (Fn [a] Unit))
578+
579+ (defn my-generic-force-delete [a]
580+ ())
581+ ```
582+
583+ This `my-generic-force-delete` function takes ownership of whatever argument it
584+ receives and does nothing. Since it takes ownership, however, the value passed
585+ to `a`, if it's linear, needs to be deleted at the end of the function scope.
586+
587+ Since the function is generic, the memory management system can't know for
588+ certain what value is being passed. In some cases it might be a linear value, in
589+ some cases it might not be. Sometimes it might be a `String`, sometimes an
590+ `Int`, or sometimes an `Array`. Each of these types has a different `delete`
591+ implementation.
592+
593+ Rather than having the memory management system figure out what function to use,
594+ the system instead just keeps track of the types of all the values for the forms
595+ it analyzes. Later, the component already dedicated to resolving generic
596+ functions handles finding the right deletion routine for the values passed to
597+ the generic function. In order to accomplish this, it uses the type information
598+ captured by the memory system as it analyzes each form.
599+
600+ ### Memory State
601+
602+ As we've explored, the memory management system needs to keep track of three key
603+ pieces of information as it analyzes the forms in your program:
604+
605+ 1. The deleters assigned to each AST node to track ownership of linear values
606+ and delete them at the right time.
607+ 2. The Lifetimes assigned to each reference to check reference validity.
608+ 3. The types of each form it analyzes to resolve generic deletion functions.
609+
610+ Each of these units of information is bundled into a single data structure,
611+ called the *memory state* or `MemState` of your program.
612+
613+ As the memory management system analyzes each of the AST nodes in your program
614+ source, it updates the memory state accordingly. Deleters are added and removed
615+ from the state at different points as ownership transfers of linear values
616+ occur. When the system finishes analyzing a node, it update's the node's `Info`
617+ object, attaching the deleters associated with the current memory state. At any
618+ point, if the memory management system encounters a problem with the way memory
619+ is being transferred across your program's AST nodes, it reports an error.
417620
418621A simple piece of code:
419622
0 commit comments