Description
Incremental gopls
This is an umbrella issue to track the work we've been doing to change the way gopls scales (and significantly reduce memory usage and CPU on large codebases). We've been working off an internal design, but I wanted to share some of that here and have a public issue to track the work.
The main goal of this project is to use on-disk export data and indexing to allow gopls to persist data across sessions, reloading on-demand, thereby reducing its overall memory footprint and startup time.
As a result of this change, gopls memory usage (at least related to package information) will be O(open packages), rather than O(workspace). Furthermore, we will break global relationships between elements of gopls' cache, simplifying gopls' execution model and eliminating multiple categories of bugs.
Background
Gopls, as it very recently existed, was a monolithic build system. Certain algorithms relied on a global package graph containing the full set of workspace packages. For example, to find references to a given identifier, gopls simply walked all packages in the reverse transitive cone of the declaring package(s) and checked types.Info.Uses
for references to the declared object. In order for this algorithm to be correct and sufficiently fast, gopls must hold all type-checked packages in memory (including many redundant intermediate test variants!).
We can't solve gopls' scaling problems until we rewrite these algorithms and fix all the places where gopls makes assumptions of global identity. This is what we've been working on.
High level plan
- Design a shallow export data format that does not bundle its transitive closure.
- Add a mechanism for on-disk caching.
- Implement package analysis using export data.
- Rewrite workspace-wide queries to use indexes that are independent of
types.Package
ortypes.Object
identity.- workspace symbols
- references
- call hierarchy
- implementations
- rename (may require re-implementing
types.Identical
andtypes.AssignableTo
with a new notion of object identity, c.f. proposal: go/types: define identity of package-level Named types in terms of Package.Path equality #57497) - Unimported completion
- Separate the concept of a "syntax package" (something containing AST and types.Info) from an "export package" (a
types.Package
with type information for exported symbols). Syntax packages are used to fully understand the syntax a user is working on. Export packages are used for type checking. Currently, gopls has a concept of "exported parse mode", which produces a syntax package on a truncated AST. This exists to reduce memory, but means that syntax packages may be partial, a source of many historical and current bugs. All current uses of partial packages in the package graph can (and must) be eliminated or replaced with a judicious use of parsing or type-checking on-demand. - Create a control plane to manage package information that must be preserved in memory vs re-computed on demand or transiently cached.
- When importing during type-checking, use export packages for packages outside the workspace. For now, continue to produce syntax packages for all packages inside the workspace.
- Persist and load export packages from disk.
- Load xrefs and methodset indexes from disk, rather than hanging them off of syntax packages.
- Drop all syntax packages, except those with open files.
- Implement precise pruning
- Investigate holding on to packages imported by open packages, to reduce re-type-checking latency.
- Revisit diagnostic storage and retrieval: diagnostics are re-accessed in multiple places, assuming they will be free to retrieve: fix TestBadlyVersionedModule
-
Improve the UX around indexing (better progress notifications, partial results, etc).