|
2 | 2 | // Use of this source code is governed by a BSD-style
|
3 | 3 | // license that can be found in the LICENSE file.
|
4 | 4 |
|
5 |
| -// Package typerefs extracts from Go syntax a graph of symbol-level |
6 |
| -// dependencies, for the purpose of precise invalidation of package data. |
| 5 | +// Package typerefs extracts symbol-level reachability information |
| 6 | +// from the syntax of a Go package. |
7 | 7 | //
|
8 | 8 | // # Background
|
9 | 9 | //
|
|
15 | 15 | // More precisely, for each package P we define the set of "reachable" packages
|
16 | 16 | // from P as the set of packages that may affect the (deep) export data of the
|
17 | 17 | // direct dependencies of P. By this definition, the complement of this set
|
18 |
| -// cannot affect any information derived from type checking P (e.g. |
19 |
| -// diagnostics, cross references, or method sets). Therefore we need not |
| 18 | +// cannot affect any information derived from type checking P, such as |
| 19 | +// diagnostics, cross references, or method sets. Therefore we need not |
20 | 20 | // invalidate any results for P when a package in the complement of this set
|
21 | 21 | // changes.
|
22 | 22 | //
|
|
26 | 26 | // dotted identifiers referenced in the declaration of D, that may affect
|
27 | 27 | // the type of D. However, these references reflect only local knowledge of the
|
28 | 28 | // package and its dependency metadata, and do not depend on any analysis of
|
29 |
| -// the dependencies themselves. |
| 29 | +// the dependencies themselves. This allows the reference information for |
| 30 | +// a package to be cached independent of all others. |
30 | 31 | //
|
31 | 32 | // Specifically, if a referring identifier I appears in the declaration, we
|
32 | 33 | // record an edge from D to each object possibly referenced by I. We search for
|
33 |
| -// references within type syntax, but do not actual type-check, so we can't |
| 34 | +// references within type syntax, but do not actually type-check, so we can't |
34 | 35 | // reliably determine whether an expression is a type or a term, or whether a
|
35 | 36 | // function is a builtin or generic. For example, the type of x in var x =
|
36 | 37 | // p.F(W) only depends on W if p.F is a builtin or generic function, which we
|
|
39 | 40 | //
|
40 | 41 | // - If I is declared in the current package, record a reference to its
|
41 | 42 | // declaration.
|
42 |
| -// - Else, if there are any dot-imported imports in the current file and I is |
43 |
| -// exported, record a (possibly dangling) edge to the corresponding |
44 |
| -// declaration in each dot-imported package. |
| 43 | +// - Otherwise, if there are any dot imports in the current |
| 44 | +// file and I is exported, record a (possibly dangling) edge to |
| 45 | +// the corresponding declaration in each dot-imported package. |
45 | 46 | //
|
46 | 47 | // If a dotted identifier q.I appears in the declaration, we
|
47 | 48 | // perform a similar operation:
|
| 49 | +// |
48 | 50 | // - If q is declared in the current package, we record a reference to that
|
49 | 51 | // object. It may be a var or const that has a field or method I.
|
50 |
| -// - Else, if q is a valid import name based on imports in the current file |
| 52 | +// - Otherwise, if q is a valid import name based on imports in the current file |
51 | 53 | // and the provided metadata for dependency package names, record a
|
52 | 54 | // reference to the object I in that package.
|
53 | 55 | // - Additionally, handle the case where Q is exported, and Q.I may refer to
|
|
62 | 64 | // # Graph optimizations
|
63 | 65 | //
|
64 | 66 | // The references extracted from the syntax are used to construct
|
65 |
| -// edges between declNodes. Edges are of two kinds: internal |
66 |
| -// references, from one package-level declaration to another; and |
67 |
| -// external references, from a symbol in this package to a symbol |
68 |
| -// imported from a direct dependency. |
| 67 | +// edges between nodes representing declarations. Edges are of two |
| 68 | +// kinds: internal references, from one package-level declaration to |
| 69 | +// another; and external references, from a symbol in this package to |
| 70 | +// a symbol imported from a direct dependency. |
69 | 71 | //
|
70 | 72 | // Once the symbol reference graph is constructed, we find its
|
71 |
| -// strongly connected components (SCCs) using Tarjan's algorithm. A |
72 |
| -// node from each SCC is chosen arbitrarily to be its representative, |
73 |
| -// and all the edges (internal and external) of the SCC are |
74 |
| -// accumulated into the representative, thus forming the strong |
75 |
| -// component graph, which is acyclic. This property simplifies the |
76 |
| -// logic and improves the efficiency of the reachability query. |
77 |
| -// |
78 |
| -// TODO(adonovan): opt: subsequent planned optimizations include: |
79 |
| -// |
80 |
| -// - The Hash-Value Numbering optimization described in |
81 |
| -// Hardekopf and Lin; see golang.org/x/go/pointer/hvn.go for an |
82 |
| -// implementation. (Like pointer analysis, our problem is |
83 |
| -// fundamentally one of graph reachability.) |
84 |
| -// |
85 |
| -// The "pointer equivalence" (PE) portion of this algorithm uses a |
86 |
| -// hash table to create a mapping from unique sets of external |
87 |
| -// references to small integers. Each of the n external symbols |
88 |
| -// referenced by the package is assigned a integer from 1 to n; |
89 |
| -// this number stands for a singleton set. Higher numbers refer to |
90 |
| -// unions of strictly smaller sets. The PE algorithm allows us to |
91 |
| -// coalesce redundant graph nodes. For example, all functions that |
92 |
| -// ultimately reference only {fmt.Println,fmt.Sprintf} would be |
93 |
| -// marked as equivalent to each other, and to the union of |
94 |
| -// the sets of {fmt.Sprint} and {fmt.Println}. |
95 |
| -// |
96 |
| -// This reduces the worst-case size of the Refs() result. Consider |
97 |
| -// M decls that each reference type t, which references N imported |
98 |
| -// types. The source code has O(M + N) lines but the Refs result |
99 |
| -// is current O(M*N). Preserving the essential structure of the |
100 |
| -// reference graph (as a DAG of union operations) will reduce the |
101 |
| -// asymptote. |
102 |
| -// |
103 |
| -// - Serializing the SC graph obtained each package and saving it in |
104 |
| -// the file cache. Once we have a DAG of unions, we can serialize |
105 |
| -// it easily and amortize the cost of the local preprocessing. |
| 73 | +// strongly connected components (SCCs) using Tarjan's algorithm. |
| 74 | +// As we coalesce the nodes of each SCC we compute the union of |
| 75 | +// external references reached by each package-level declaration. |
| 76 | +// The final result is the mapping from each exported package-level |
| 77 | +// declaration to the set of external (imported) declarations that it |
| 78 | +// reaches. |
| 79 | +// |
| 80 | +// Because it is common for many package members to have the same |
| 81 | +// reachability, the result takes the form of a set of equivalence |
| 82 | +// classes, each mapping a set of package-level declarations to a set |
| 83 | +// of external symbols. We use a hash table to canonicalize sets so that |
| 84 | +// repeated occurrences of the same set (which are common) are only |
| 85 | +// represented once in memory or in the file system. |
| 86 | +// For example, all declarations that ultimately reference only |
| 87 | +// {fmt.Println,strings.Join} would be classed as equivalent. |
| 88 | +// |
| 89 | +// This approach was inspired by the Hash-Value Numbering (HVN) |
| 90 | +// optimization described by Hardekopf and Lin. See |
| 91 | +// golang.org/x/tools/go/pointer/hvn.go for an implementation. (Like |
| 92 | +// pointer analysis, this problem is fundamentally one of graph |
| 93 | +// reachability.) The HVN algorithm takes the compression a step |
| 94 | +// further by preserving the topology of the SCC DAG, in which edges |
| 95 | +// represent "is a superset of" constraints. Redundant edges that |
| 96 | +// don't increase the solution can be deleted. We could apply the same |
| 97 | +// technique here to further reduce the worst-case size of the result, |
| 98 | +// but the current implementation seems adequate. |
106 | 99 | //
|
107 | 100 | // # API
|
108 | 101 | //
|
109 |
| -// The main entry point for this analysis is the [Refs] function, which |
110 |
| -// implements the aforementioned syntactic analysis for a set of files |
111 |
| -// constituting a package. |
| 102 | +// The main entry point for this analysis is the [Encode] function, |
| 103 | +// which implements the analysis described above for one package, and |
| 104 | +// encodes the result as a binary message. |
112 | 105 | //
|
113 |
| -// These references use shared state to efficiently represent references, by |
114 |
| -// way of the [PackageIndex] and [PackageSet] types. |
| 106 | +// The [Decode] function decodes the message into a usable form: a set |
| 107 | +// of equivalence classes. The decoder uses a shared [PackageIndex] to |
| 108 | +// enable more compact representations of sets of packages |
| 109 | +// ([PackageSet]) during the global reacahability computation. |
115 | 110 | //
|
116 | 111 | // The [BuildPackageGraph] constructor implements a whole-graph analysis similar
|
117 | 112 | // to that which will be implemented by gopls, but for various reasons the
|
|
120 | 115 | // BuildPackageGraph and its test serve to verify the syntactic analysis, and
|
121 | 116 | // may serve as a proving ground for new optimizations of the whole-graph analysis.
|
122 | 117 | //
|
123 |
| -// # Comparison with export data |
| 118 | +// # Export data is insufficient |
124 | 119 | //
|
125 | 120 | // At first it may seem that the simplest way to implement this analysis would
|
126 | 121 | // be to consider the types.Packages of the dependencies of P, for example
|
|
0 commit comments