Update Docs.md

safesparrow · web-flow · commit c1bf18520b6b · 2022-12-19T19:03:36.000+01:00
diff --git a/src/Compiler/Driver/GraphChecking/Docs.md b/src/Compiler/Driver/GraphChecking/Docs.md
@@ -1,15 +1,10 @@
-﻿# Parallel type-checking in F#
-
+##  Parallel type-checking in F﻿Sharp
 This document describes the idea and implementation details for parallel type-checking of independent files in the F# compiler.
 
 Performance of F# compilation and code analysis is one of the concerns for big codebases.
-
-Despite several recent improvements in this area, there is still appetite and potential for change.
-
-One idea for such an improvement was originally described in https://github.com/dotnet/fsharp/discussions/11634 by @kerams .
-It is going to be the main topic of this page.
-
-But before we dive into the details of the proposal, let's first discuss how the things work at the moment.
+One way to speed it up was originally described in https://github.com/dotnet/fsharp/discussions/11634 by @kerams .
+That is going to be the main topic of this page.
+But before we dive into the details, let's first discuss how the things work at the moment.
 
 ## Context and the current state of the compiler
 
@@ -29,8 +24,9 @@ This instance is incrementally built on as more and more files have been process
 
 ### Recent addition - "Parallel type checking for impl files with backing sig files"
 
-A recent [change](https://github.com/dotnet/fsharp/pull/13737) introduced in the compiler introduced a level of parallelism in type-checking.
-It allows for parallel type-checking of all `.fs` files backed by `.fsi` files. Such `.fs` files by definition cannot be depended upon by any other files w.r.t. type-checking, since all the necessary information is exposed by the corresponding `.fsi` files.
+A recent [change](https://github.com/dotnet/fsharp/pull/13737) introduced in the compiler added a level of parallelism in type-checking (behind an experimental feature flag).
+It allows for parallel type-checking of implementation files backed by signature files.
+Such files by definition cannot be depended upon by any other files w.r.t. type-checking, since all the necessary information is exposed by the corresponding `.fsi` files.
 
 The new feature, when enabled, allows partial parallelisation of type-checking as follows:
 1. All `.fsi` files and `.fs` files without backing `.fsi` files are type-checked in sequence, as before.
@@ -43,13 +39,13 @@ Some data points:
 - F# codebase build time: 112s -> 92s - [link](https://github.com/dotnet/fsharp/pull/13737#issuecomment-1223386853)
 
 #### Enabling the feature
-
 The feature is opt-in and can be enabled in the compiler via a CLI arg & MSBuild property.
 
 ## The importance of using Server GC for parallel work
 
 By default .NET processes use Workstation GC, which is single-threaded. What this means is it can become a bottleneck for highly-parallel operations, due to increased GC pressure and the cost of GC pauses being multiplied by the number of threads waiting. 
 That is why when increasing parallelisation of the compiler and the compiler service it is important to note the GC mode being used and consider enabling Server GC.
+This is no different for parallel type-checking - any performance tests of the feature should be done using Server GC. 
 
 Below is an example showing the difference it can make for a parallel workflow.
 
@@ -67,10 +63,10 @@ For more details see https://github.com/dotnet/fsharp/pull/13521
 
 The main idea is quite simple:
 - process files in a graph order instead of sequential order
-- reduce the dependency graph used for type-checking, increasing parallelism
-- implement branching and merging of type-checking information
+- quickly reduce the dependency graph used for type-checking, increasing parallelism possible
+- implement delta-based type-checking that allows building a 'fresh' TcState copy from a list of delta-based results.
 
-Below is some quasi-theoretical background on this.
+Below is some quasi-theoretical background on type-checking in general.
 
 ### Background
 Files in an F# project are ordered and processed from the top (first) and the bottom (last) file.
@@ -88,15 +84,15 @@ By default, they are type-checked in the order of appearance: `[A.fs, B.fs, C.fs
 Let's define `allowed dependency` as follows:
 > If the contents of 'B.fs' _can_, based on its position in the project hierarchy, influence the type-checking process of 'A.fs', then 'A.fs' -> 'B.fs' is an _allowed dependency_ 
 
-The graph of dependencies between the files that the compiler _allows_ looks as follows:
+The _allowed dependencies graph_ looks as follows:
 ```
 A.fs -> []
 B.fs -> [A.fs]
 C.fs -> [B.fs; A.fs]
 D.fs -> [C.fs; B.fs; A.fs]
 ```
 
-Type-checking files in the appearance order guarantees that when processing a given file, any files it _might_ depend on w.r.t. type-checking have already been type-checked and their type information is available.
+Sequential type-checking of files in the appearance order guarantees that when processing a given file, any files it _might_ need w.r.t. type-checking have already been type-checked and their type information is available.
 
 ### Necessary dependencies
 
@@ -107,8 +103,9 @@ And finally a `dependency graph` as follows:
 > A _dependency graph_ is any graph that is a subset of the `allowed dependencies` graph and a superset of the `necessary dependencies` graph
 
 A few slightly imprecise/vague statements about all the graphs:
+1. Any dependency graph is a directed, acycling graph (DAG).
 1. The _Necessary dependencies_ graph is a subgraph of the _allowed dependencies_ graph.
-2. If there is no path between 'B.fs' and 'C.fs' in the _necessary dependencies_ graph, they can be type-checked in parallel, as long as there is a way to maintain and merge more than one instance of type-checking information.
+2. If there is no path between 'B.fs' and 'C.fs' in the _necessary dependencies_ graph, they can be type-checked in parallel (as long as there is a way to maintain more than one instance of type-checking information).
 3. Type-checking _must_ process files in an order that is compatible with the topological order in the _necessary dependencies_ graph.
 4. If using a dependency graph as an ordering mechanism for (parallel) type-checking, the closer it is to the _necessary dependencies_ graph, the higher parallelism is possible.
 5. Type-checking files in appearance order is equivalent to using the `allowed dependencies` graph for ordering.
@@ -118,20 +115,33 @@ Let's look at point `6.` in more detail.
 
 ### The impact of reducing the dependency graph on type-checking parallelisation and wall-clock time.
 
-Let's make a few definitions and assumptions:
-1. Time it takes to type-check file f = 'T(f)'
-2. Time it takes to type-check files f1...fn in parallel = 'T(f1+...fn)'
-3. Time it takes to type-check a file f and all its dependencies = 'D(f)'
-4. Type-checking is performed on a machine with infinite number of parallel processors.
-5. There is no slowdowns due to parallel processing, ie. T(f1+...+fn) = max(T(f1),...,T(fn))
+Let us make a few definitions and simplifications:
+1. Time it takes to type-check file f = `T(f)`
+2. Time it takes to type-check files f1...fn in parallel = `T(f1+...fn)`
+3. Time it takes to type-check a file f and all its dependencies = `D(f)`
+4. Time it takes to type-check the graph G = `D(G)`
+5. Type-checking is performed on a machine with infinite number of parallel processors.
+6. There is no slowdowns due to parallel processing, ie. T(f1+...+fn) = max(T(f1),...,T(fn))
 
 With the above it can be observed that:
 ```
-D(f) = max(D(n)) + T(f), for n = any necessary dependency of f
+D(G) = max(D(f)), for any file 'f'
+
+and
+
+D(f) = max(D(n)) + T(f) for n = any necessary dependency of 'f'
 ```
 In other words wall-clock time for type-checking using a given dependency graph is equal to the "longest" path in the graph.
 
-Therefore the main goal that the idea presented here aims to achieve is to replace the _allowed dependencies_ graph as currently used with a reduced graph that's much closer to the _necessary dependencies_ graph, therefore optimising the type-checking process.
+For the _allowed dependencies graph_ the following holds:
+```
+D(f) = T(f) + sum(T(g)), for all files 'g' above file 'f'
+```
+In other words, the longest path's length = the sum of times to type-check all files.
+
+Therefore the change that parallel type-checking brings is the replacement of the _allowed dependencies_ graph as currently used with a reduced graph that is:
+- much more similar to the _necessary dependencies_ graph,
+- providing a smaller value of `D(G)`.
 
 ## A way to reduce the dependency graph used
 
@@ -141,93 +151,73 @@ However, there exist cheaper solutions that reduce the initial graph significant
 
 As noted in https://github.com/dotnet/fsharp/discussions/11634 , scanning the ASTs can provide a lot information that helps narrow down the set of types, modules/namespaces and files that a given file _might_ depend on.
 
-This is the approach we're taking.
+This is the approach used in this solution.
 
 The dependency detection algorithm can be summarised as follows:
-1. For each parsed file in parallel:
-   1. Extract its top-level module or a list of top-level namespaces
-   2. Find all partial module references in the AST by traversing it once. Consider `AutoOpens`, module abbreviations, partial opens etc.
-2. Build a single [Trie](https://en.wikipedia.org/wiki/Trie) by adding all top-level items extracted in 1.i. Every module/namespace _segment_ (eg. `FSharp, Compiler, Service in FSharp.Compiler.Service`) is represented by a single edge in the Trie. Note down positions of all added files in the Trie.
-3. For each file in parallel:
-   1. Clone the Trie.
-   2. Start by marking the root as 'reachable'.
-   3. Process all partial module references found in this file in 1.ii one-by-one, in order of appearance in the AST:
-      1. For a given partial reference:
-         1. Start at every 'reachable' node.
-         2. 'Extend' the node with the new partial reference by walking down the Trie. If a leaf is reached, do not go further/do not extend the Trie.
-         3. Mark the reached node as 'reachable'
-   4. Collect all reachable nodes and their ancestors.
-   5. Find all files that added one or more modules/namespaces to any of the nodes found in 5.
-   6. Return those as dependencies.
-
-### Graph optimisation - going deeper than just top-level module
-
-In 1.i of the above algorithm we only consider the top-level module/namespace(s) for each file.
-As soon as some other file is able to navigate to that top-level item, we consider it a dependency.
-
-A more or less straightforward optimisation of this behaviour would be to:
-1. In 1.i Collect a tree of modules/namespaces that contain any types/exceptions that can be used for type inference.
-2. In 2. add all the leaves from the tree to the Trie.
-
-This should further reduce the graph.
+1. For each parsed file in parallel, use its parsed AST to extract the following:
+	1. Top-level definitions (modules and namespaces). Consider `AutoOpens`.
+	2. Opens, partial module/namespace references. Consider module abbreviations, partial opens etc.
+2. Build a single [Trie](https://en.wikipedia.org/wiki/Trie) by adding all top-level items extracted in 1.1. Edges in the Trie represent module/namespace _segments_ (eg. `FSharp, Compiler, Service in FSharp.Compiler.Service`).
+Nodes represent module/namespace prefixes (eg. `FSharp.Compiler`).
+For each node, keep track of any files that define the prefix it represents.
+3. For each file, in parallel:
+	1. Process all partial module references found in this file in 1.2. one-by-one, in order of appearance in the AST.
+	2. For each reference, identify what nodes in the Trie can be located using it.
+	3. Collect all nodes reached this way and all files with any items located in those nodes.
+	4. Return those files as dependencies.
 
 ### Edge-case 1. - `[<AutoOpen>]`
 
-Modules with `[<AutoOpen>]` are in a way 'transparent', meaning that all the types/nested modules inside them are surfaced as if they were on a level above.
-
-The algorithm needs to consider that.
+Modules with `[<AutoOpen>]` are in a way 'transparent', meaning that all the types/nested modules inside them are surfaced as if they were on a level above 
 
-The main problem with that is that `AutoOpenAttribute` could be aliased and hide behind a different name.
+The main problem with that is that `System.AutoOpenAttribute` could be aliased and hide behind a different name.
 Therefore it's not easy to see whether the attribute is being used based only on the AST.
 
 There are ways to evaluate this, which involve scanning all module abbreviations in the project and in any referenced dlls.
+However, currently the algorithm uses a shortcut: it checks whether the attribute type name is on a hardcoded list of "suspicious" names. This is not fully reliable, as an arbitrary type alias, eg. `type X = System.AutoOpenAttribute` will not be recognised correctly.
 
-However, for now, the algorithm simply treats every module with _any_ attributes as an `[<AutoOpen>]` module.
-
-This retains correctness, but reduces efficiency of the graph reduction. Optimisation ideas here are welcome.
+The alternatives are:
+- Consider any module with any attribute to be an `AutoOpen` module
+- Identify any type aliases in any referenced code to be able to fully reliably determine whether a given attribute is in fact the `AutoOpenAttribute`
 
 ### Edge-case 2. - module abbreviations
 
-At the moment the algorithm doesn't support files with module abbreviations. Adding support should be doable.
+Module abbreviations do not require any special handling in the current algorithm.
+Consider the following example:
+```
+// F1.fs
+module A
+module B = let x = 1
+
+// F2.fs
+module C
+open A
+module D = B
+```
+Here, the line `module D = B` generates the `F2.fs -> F1.fs` link, so no special code is needed to add it.
 
 ### Performance
+There are two main factors w.r.t. performance of the graph-based type-checking:
+1. The level of parallelisation allowed by the resolved dependency graph.
+2. The overhead of creating the dependency graph and graph-based processing of the graph.
+At minimum, to make this feature useful, any overhead (2.) cost should in the vast majority of usecases be significantly lower than the speedup generated by 1.
 
-Little to no effort has been invested in optimising the algorithm.
-
-However, initial tests show that in its current unoptimised form it is very performant.
-
-Sample results for `FSharp.Compiler.Service`:
-
-| Phase                                    | Time                                |
-|------------------------------------------|-------------------------------------|
-| Parallel Parsing                         | 1.70s                               |
-| Type-Checking                            | 21.6s (13s with fsi optimisation)   |
-| Total compilation time w/o optimisations | 40.3s (33.3s with fsi optimisation) |
-| Dependency resolution - total            | 0.23s                               |
-| Dependency resolution - AST traversal    | 0.18s                               |
-| Dependency resolution - Trie processing  | 0.05s                               |
+Initial timings showed that the graph-based type-checking was significantly faster than sequential type-checking and faster than the two-phase type-checking feature.
+Projects that were tested included:
+- `FSharp.Compiler.Service`
+- `Fantomas.Core`
+- `FSharp.Compiler.ComponentTests`
 
-Things that can be easily improved:
-- Quicker AST traversal - tail-recursion, not using `seq`, avoid allocations
-- Quicker Trie operations
-
-#### Overhead of dispatching work and branching/merging state
-
-On top of dependency resolution, the feature will add some overhead for dispatching work and allowing separation of type-checking state in the graph.
-
-No timings are available at the moment.
+TODO: Provide detailed timings
 
 ## The problem of maintaining multiple instances of type-checking information
 
 The parallel type-checking idea generates a problem that needs to be solved.
-Instead of one instance of the type-checking information, we now have to:
-- 'clone' an instance multiple times when passing the state from one file to one or more of its dependants
-- 'merge' an instance when receiving state instances from one or more dependencies
-
-We believe this is doable, although no implementation exists as of yet.
+Instead of one instance of the type-checking information, we now have to maintain multiple instances - one for each node in the graph.
+We solve it in the following way:
+1. Each file's type-checking results in a 'delta' function `'State -> 'State` which adds information to the state.
+2. When type-checking a new file, its input state is built from scratch by evaluating delta functions of all its dependencies.
 
 ### Ordering of diagnostics/errors
 
-As noted in https://github.com/dotnet/fsharp/pull/13737#issuecomment-1224124532 , disrupting the processing order makes it difficult to retain the original behaviour when it comes to the order in which diagnostics are presented and suppressed.
-
-The importance of this problem and potential solutions are open questions.
+Any changes in scheduling of work that can produce diagnostics can change the order in which diagnostics appear to the end user. To retain existing ordering of diagnostics, we use a mechanism where each work item first uses a dedicated logger, and at the end individual loggers are sequentially replayed into the single logger, in the desired order. This mechanism is used in a few places in the compiler already.