From 42db6c4072e3a9172e8e39c21fcf13cab170667a Mon Sep 17 00:00:00 2001 From: Francesco Dainese Date: Fri, 26 Jul 2019 23:24:06 +0200 Subject: [PATCH 1/3] Compiler Lecture: How Salsa works --- src/SUMMARY.md | 1 + src/compiler_lectures/how_salsa_works.md | 169 +++++++++++++++++++++++ 2 files changed, 170 insertions(+) create mode 100644 src/compiler_lectures/how_salsa_works.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 75e1fcdc2..8c95cd3e4 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -31,6 +31,7 @@ - [The Query Evaluation Model in Detail](./queries/query-evaluation-model-in-detail.md) - [Incremental compilation](./queries/incremental-compilation.md) - [Incremental compilation In Detail](./queries/incremental-compilation-in-detail.md) + - [How Salsa works](./compiler_lectures/how_salsa_works.md) - [Debugging and Testing](./incrcomp-debugging.md) - [The parser](./the-parser.md) - [`#[test]` Implementation](./test-implementation.md) diff --git a/src/compiler_lectures/how_salsa_works.md b/src/compiler_lectures/how_salsa_works.md new file mode 100644 index 000000000..b1dd96279 --- /dev/null +++ b/src/compiler_lectures/how_salsa_works.md @@ -0,0 +1,169 @@ +# How Salsa works + +This chapter is based on the explanation given by Niko Matsakis in this [video](https://www.youtube.com/watch?v=_muY4HjSqVw) about [Salsa](https://github.com/salsa-rs/salsa). + +## What is Salsa? + +Salsa is a library for incremental recomputation, this means reusing computation that has already been done in the past to increase the efficiency of future computations. + +The objectives of Salsa are: + * Provide that functionality in an automatic way, so reusing old computations is done automatically by the library + * Doing so in a "sound", or "correct", way, therefore leading the same results as if it had been done from scratch + +Salsa actual model is much richer, allowing many kinds of inputs and many different outputs. +For example, integrating Salsa with an IDE could mean that the inputs could be the manifest (`Cargo.toml`), entire source files (`foo.rs`), snippets and so on; the outputs of such integration could range from a binary executable, to lints, types (for example, if a user selects a certain variable and wishes to see it's type), completitions etcetera. + +## How does it work? + +The first thing that Salsa has to do is identify the "base inputs" [^EN1]. + +Then Salsa has to also identify intermidiate, "derived" values, which are something that the library produces, but, for each derived value there's a "pure" function that computes the derived value. + +For example, there might be a function `ast(x: Path) -> AST`. The produced `AST` isn't a final value, it's an intermidiate value that the library would use for the computation. + +This means that when you try to compute with the library, Salsa is going to compute various derived values, and eventually read the input and produce the result for the asked computation. + +Salsa is going to track, in the course of computing, which inputs were accessed, which derived values, and this is going to be used later to determine what's going to happen when the inputs change: are the derived values still valid? + +This doesn't mean necessarily that each computation downstream from the input is going to be checked, since that is going to be costly. Salsa only needs to check each downstream computation until it finds one that isn't changed, therefore it won't need to check the other derived computations, since they wouldn't need to change. + +It's is helpful to think about this as a graph with nodes. Each derived value has a dependency on some other values, that could be base or derived. Base values don't have a dependency. + +```ignore +I <- A <- C ... + | +J <- B <--+ +``` + +When an input `I` changes, the derived value `A` could have changed, but another derived value `B` , which doesn't depend neither on `I`, nor does it depends on `A`, nor does it depends on any other derived value from `A` or `I`, is not subject to change. +Therefore, Salsa can reuse the computation done for `B` in the past, without having to compute it again. + +The computation could also terminate early. Keeping the same graph as before, say that input `I` has changed in some way (and input `J` hasn't) but, when computing `A` again, it's found that `A` hasn't changed from the previous computation. This leads to an "early termination", because there's no need to check if `C` needs to change, since both `C` direct inputs, `A` and `B`, haven't changed. + +## Key Salsa concepts +### Query +A query is some value that Salsa can access in the course of computation. +Each query can have a number of keys (from 0 to many), and all queries have a result, akin to functions. +0-key queries are called "input" queries. +### Database +The database is basically the context for the entire computation, it's gonna store all Salsa's internal state, all intermidiate values for each query, and anything else that the computation might need. +The database to know all the queries that the library is going to do before it can be build, but they don't need to be specified in the same place. + +After the database is formed, it can be accessed with queries that are basically going to work like functions. +Since each query is going to be stored in the database, when a query is invoked N times, it's going to return N **cloned** results, without having to recompute the query (unless the input has changed in such a way that it warrants recomputation). + +For each input query (0-key), there's going to be a "set" method, that allows to change the output of such query, and trigger previous memoized values to be potentially invalidated. + +### Query Groups +A query group is a set of queries which have been defined together as a unit. The dabase is formed by combining query groups. +Query groups are akin to "Salsa modules" [^EN2]. + +A set of queries in a query group are just a set of methods in a trait. + +To create a query group a trait annotated with a specific attribute (`#[salsa::query_group(...)]`) has to be created. + +An argument must also be provided to said attribute as it will be used by Salsa to create a struct to be used later when the database is created. + +Example input query group: + +```rust,ignore +///This attribute will process this tree, +///produce this tree as output, +///and produce a bunch of intermidiate stuff +///that Salsa also uses. +///One of these things is a "StorageStruct", whose name we have specified in the attribute. +#[salsa::query_group(InputsStorage)] +pub trait Inputs { + //! This query group is a bunch of **input** queries, that do not rely on any derived input + + ///This attribute (`#[salsa::input]`) indicates that this query is a base input, + ///therfore `set_manifest` is going to be auto-generated + #[salsa::input] + fn manifest(&self) -> Manifest; + + #[salsa::input] + fn source_text(&self, name: String) -> String; +} +``` + +To create a **derived** query group, one must specify which other query group does this one depends on, by specifying it as a supertrait, as seen in the following example: + +```rust,ignore +///This query group is going to contain queries that depend on derived values +///a query group can access another query group's queries +///by specifying the dependency as a super trait +///query groups can be stacked as much as needed +///using that pattern +#[salsa::query_group(ParserStorage)] +pub trait Parser: Inputs { + + ///This query `ast` is not an input query, it's a derived query + ///this means that a definition is necessary + fn ast(&self, name: String) -> String; + +} +``` + +When creating a derived query the implementation of said query must be defined outside the trait. +The definition must take a database parameter as an `impl Trait` (or `dyn Trait`), where `Trait` is the query group that the definition belongs to, in addition to the other keys. + +```rust,ignore +///This is going to be the definition of the `ast` query in the `Parser` trait. +///So, when the query `ast` is invoked, and it needs to be recomputed, Salsa is going to call this function +///and it's is going to give it the database as `impl Parser`. +///The function doesn't need to be aware of all the queries of all the query groups +fn ast(db: &impl Parser, name: String) -> String { + //! Note, `impl Parser` is used here but `dyn Parser` works just as well + + /* code */ + + ///By passing an `impl Parser`, this is allowed + let source_text = db.input_file(name); + + /* do the actual parsing */ + + return ast; +} +``` + +Eventually, after all the query groups have been defined, the database can be created by declaring a struct. + +To specify which query groups are going to be part of the database an attribute +(`#[salsa::database(...)]`) must be added. The argument of said attribute is a list of identifiers, specifying the query groups **storages**. + +```rust,ignore +///This attribute specifies which query groups are going to be in the database +#[salsa::database(InputsStorage, ParserStorage)] +#[derive(Default)] //optional! +struct MyDatabase { + ///You also need this one field + runtime : salsa::Runtime, +} + +///And this trait has to be implemented +impl salsa::Databse for MyDatabase { + fn salsa_runtime(&self) -> &salsa::Runtime { + &self.runtime + } +} +``` + +Example usage: + +```rust,ignore +fn main() { + let db = MyDatabase::default(); + + db.set_manifest(...); + db.set_source_text(...); + + loop { + db.ast(...); //will reuse results + db.set_source_text(...); + } +} +``` + +[^EN1]: "They are not something that you **inaubible** but something that you kinda get **inaudible** from the outside [3:23](https://youtu.be/_muY4HjSqVw?t=203). + +[^EN2]: What is a Salsa module? \ No newline at end of file From 2e4ef77e68efec2ae3eed6490684d8c9a9e482bc Mon Sep 17 00:00:00 2001 From: Karrq Date: Sun, 28 Jul 2019 19:10:57 +0200 Subject: [PATCH 2/3] Fixed typos and some phrasing Thanks @shepmaster! Co-Authored-By: Jake Goulding --- src/compiler_lectures/how_salsa_works.md | 26 ++++++++++++------------ 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/src/compiler_lectures/how_salsa_works.md b/src/compiler_lectures/how_salsa_works.md index b1dd96279..31b081fe7 100644 --- a/src/compiler_lectures/how_salsa_works.md +++ b/src/compiler_lectures/how_salsa_works.md @@ -8,10 +8,10 @@ Salsa is a library for incremental recomputation, this means reusing computation The objectives of Salsa are: * Provide that functionality in an automatic way, so reusing old computations is done automatically by the library - * Doing so in a "sound", or "correct", way, therefore leading the same results as if it had been done from scratch + * Doing so in a "sound", or "correct", way, therefore leading to the same results as if it had been done from scratch -Salsa actual model is much richer, allowing many kinds of inputs and many different outputs. -For example, integrating Salsa with an IDE could mean that the inputs could be the manifest (`Cargo.toml`), entire source files (`foo.rs`), snippets and so on; the outputs of such integration could range from a binary executable, to lints, types (for example, if a user selects a certain variable and wishes to see it's type), completitions etcetera. +Salsa's actual model is much richer, allowing many kinds of inputs and many different outputs. +For example, integrating Salsa with an IDE could mean that the inputs could be the manifest (`Cargo.toml`), entire source files (`foo.rs`), snippets and so on; the outputs of such an integration could range from a binary executable, to lints, types (for example, if a user selects a certain variable and wishes to see its type), completions, etc. ## How does it work? @@ -23,11 +23,11 @@ For example, there might be a function `ast(x: Path) -> AST`. The produced `AST` This means that when you try to compute with the library, Salsa is going to compute various derived values, and eventually read the input and produce the result for the asked computation. -Salsa is going to track, in the course of computing, which inputs were accessed, which derived values, and this is going to be used later to determine what's going to happen when the inputs change: are the derived values still valid? +In the course of computing, Salsa tracks which inputs were accessed and which values are derived. This information is used to determine what's going to happen when the inputs change: are the derived values still valid? -This doesn't mean necessarily that each computation downstream from the input is going to be checked, since that is going to be costly. Salsa only needs to check each downstream computation until it finds one that isn't changed, therefore it won't need to check the other derived computations, since they wouldn't need to change. +This doesn't necessarily mean that each computation downstream from the input is going to be checked, which could be costly. Salsa only needs to check each downstream computation until it finds one that isn't changed. At that point, it won't check other derived computations since they wouldn't need to change. -It's is helpful to think about this as a graph with nodes. Each derived value has a dependency on some other values, that could be base or derived. Base values don't have a dependency. +It's is helpful to think about this as a graph with nodes. Each derived value has a dependency on other values, which could themselves be either base or derived. Base values don't have a dependency. ```ignore I <- A <- C ... @@ -35,7 +35,7 @@ I <- A <- C ... J <- B <--+ ``` -When an input `I` changes, the derived value `A` could have changed, but another derived value `B` , which doesn't depend neither on `I`, nor does it depends on `A`, nor does it depends on any other derived value from `A` or `I`, is not subject to change. +When an input `I` changes, the derived value `A` could change. The derived value `B` , which does not depend on `I`, `A`, or any value derived from `A` or `I`, is not subject to change. Therefore, Salsa can reuse the computation done for `B` in the past, without having to compute it again. The computation could also terminate early. Keeping the same graph as before, say that input `I` has changed in some way (and input `J` hasn't) but, when computing `A` again, it's found that `A` hasn't changed from the previous computation. This leads to an "early termination", because there's no need to check if `C` needs to change, since both `C` direct inputs, `A` and `B`, haven't changed. @@ -46,8 +46,8 @@ A query is some value that Salsa can access in the course of computation. Each query can have a number of keys (from 0 to many), and all queries have a result, akin to functions. 0-key queries are called "input" queries. ### Database -The database is basically the context for the entire computation, it's gonna store all Salsa's internal state, all intermidiate values for each query, and anything else that the computation might need. -The database to know all the queries that the library is going to do before it can be build, but they don't need to be specified in the same place. +The database is basically the context for the entire computation, it's gonna store all Salsa's internal state, all intermediate values for each query, and anything else that the computation might need. +The database must know all the queries that the library is going to do before it can be built, but they don't need to be specified in the same place. After the database is formed, it can be accessed with queries that are basically going to work like functions. Since each query is going to be stored in the database, when a query is invoked N times, it's going to return N **cloned** results, without having to recompute the query (unless the input has changed in such a way that it warrants recomputation). @@ -55,7 +55,7 @@ Since each query is going to be stored in the database, when a query is invoked For each input query (0-key), there's going to be a "set" method, that allows to change the output of such query, and trigger previous memoized values to be potentially invalidated. ### Query Groups -A query group is a set of queries which have been defined together as a unit. The dabase is formed by combining query groups. +A query group is a set of queries which have been defined together as a unit. The database is formed by combining query groups. Query groups are akin to "Salsa modules" [^EN2]. A set of queries in a query group are just a set of methods in a trait. @@ -77,7 +77,7 @@ pub trait Inputs { //! This query group is a bunch of **input** queries, that do not rely on any derived input ///This attribute (`#[salsa::input]`) indicates that this query is a base input, - ///therfore `set_manifest` is going to be auto-generated + ///therefore `set_manifest` is going to be auto-generated #[salsa::input] fn manifest(&self) -> Manifest; @@ -86,7 +86,7 @@ pub trait Inputs { } ``` -To create a **derived** query group, one must specify which other query group does this one depends on, by specifying it as a supertrait, as seen in the following example: +To create a **derived** query group, one must specify which other query groups this one depends on by specifying them as supertraits, as seen in the following example: ```rust,ignore ///This query group is going to contain queries that depend on derived values @@ -166,4 +166,4 @@ fn main() { [^EN1]: "They are not something that you **inaubible** but something that you kinda get **inaudible** from the outside [3:23](https://youtu.be/_muY4HjSqVw?t=203). -[^EN2]: What is a Salsa module? \ No newline at end of file +[^EN2]: What is a Salsa module? From 956e6fea4ee532abc77b463e6f7efb5204509cd3 Mon Sep 17 00:00:00 2001 From: Karrq Date: Sun, 28 Jul 2019 19:39:37 +0200 Subject: [PATCH 3/3] Rewrote "Database" concept to use present tense --- src/compiler_lectures/how_salsa_works.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/compiler_lectures/how_salsa_works.md b/src/compiler_lectures/how_salsa_works.md index 31b081fe7..e706e9c27 100644 --- a/src/compiler_lectures/how_salsa_works.md +++ b/src/compiler_lectures/how_salsa_works.md @@ -46,13 +46,13 @@ A query is some value that Salsa can access in the course of computation. Each query can have a number of keys (from 0 to many), and all queries have a result, akin to functions. 0-key queries are called "input" queries. ### Database -The database is basically the context for the entire computation, it's gonna store all Salsa's internal state, all intermediate values for each query, and anything else that the computation might need. +The database is basically the context for the entire computation, it's meant to store Salsa's internal state, all intermediate values for each query, and anything else that the computation might need. The database must know all the queries that the library is going to do before it can be built, but they don't need to be specified in the same place. -After the database is formed, it can be accessed with queries that are basically going to work like functions. -Since each query is going to be stored in the database, when a query is invoked N times, it's going to return N **cloned** results, without having to recompute the query (unless the input has changed in such a way that it warrants recomputation). +After the database is formed, it can be accessed with queries that are very similar to functions. +Since each query's result is stored in the database, when a query is invoked N times, it will return N **cloned** results, without having to recompute the query (unless the input has changed in such a way that it warrants recomputation). -For each input query (0-key), there's going to be a "set" method, that allows to change the output of such query, and trigger previous memoized values to be potentially invalidated. +For each input query (0-key), a "set" method is generated, allowing the user to change the output of such query, and trigger previous memoized values to be potentially invalidated. ### Query Groups A query group is a set of queries which have been defined together as a unit. The database is formed by combining query groups.