diff --git a/src/SUMMARY.md b/src/SUMMARY.md index b264f8df4..4a63b9282 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -38,7 +38,7 @@ - [Incremental compilation](./queries/incremental-compilation.md) - [Incremental compilation In Detail](./queries/incremental-compilation-in-detail.md) - [Debugging and Testing](./incrcomp-debugging.md) - - [The parser](./the-parser.md) + - [Lexing and Parsing](./the-parser.md) - [`#[test]` Implementation](./test-implementation.md) - [Macro expansion](./macro-expansion.md) - [Name resolution](./name-resolution.md) diff --git a/src/appendix/code-index.md b/src/appendix/code-index.md index d58c4ea1b..0f423c6a4 100644 --- a/src/appendix/code-index.md +++ b/src/appendix/code-index.md @@ -24,7 +24,7 @@ Item | Kind | Short description | Chapter | `SourceFile` | struct | Part of the `SourceMap`. Maps AST nodes to their source code for a single source file. Was previously called FileMap | [The parser] | [src/libsyntax_pos/lib.rs](https://doc.rust-lang.org/nightly/nightly-rustc/syntax/source_map/struct.SourceFile.html) `SourceMap` | struct | Maps AST nodes to their source code. It is composed of `SourceFile`s. Was previously called CodeMap | [The parser] | [src/libsyntax/source_map.rs](https://doc.rust-lang.org/nightly/nightly-rustc/syntax/source_map/struct.SourceMap.html) `Span` | struct | A location in the user's source code, used for error reporting primarily | [Emitting Diagnostics] | [src/libsyntax_pos/span_encoding.rs](https://doc.rust-lang.org/nightly/nightly-rustc/syntax_pos/struct.Span.html) -`StringReader` | struct | This is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parser | [The parser] | [src/libsyntax/parse/lexer/mod.rs](https://doc.rust-lang.org/nightly/nightly-rustc/syntax/parse/lexer/struct.StringReader.html) +`StringReader` | struct | This is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parser | [The parser] | [src/librustc_parse/lexer/mod.rs](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html) `syntax::token_stream::TokenStream` | struct | An abstract sequence of tokens, organized into `TokenTree`s | [The parser], [Macro expansion] | [src/libsyntax/tokenstream.rs](https://doc.rust-lang.org/nightly/nightly-rustc/syntax/tokenstream/struct.TokenStream.html) `TraitDef` | struct | This struct contains a trait's definition with type information | [The `ty` modules] | [src/librustc/ty/trait_def.rs](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/trait_def/struct.TraitDef.html) `TraitRef` | struct | The combination of a trait and its input types (e.g. `P0: Trait`) | [Trait Solving: Goals and Clauses], [Trait Solving: Lowering impls] | [src/librustc/ty/sty.rs](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TraitRef.html) diff --git a/src/macro-expansion.md b/src/macro-expansion.md index d1f74ed8b..a3a3ae762 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -1,5 +1,8 @@ # Macro expansion +> `libsyntax`, `librustc_expand`, and `libsyntax_ext` are all undergoing +> refactoring, so some of the links in this chapter may be broken. + Macro expansion happens during parsing. `rustc` has two parsers, in fact: the normal Rust parser, and the macro parser. During the parsing phase, the normal Rust parser will set aside the contents of macros and their invocations. Later, diff --git a/src/the-parser.md b/src/the-parser.md index 5796ae40e..d089a8418 100644 --- a/src/the-parser.md +++ b/src/the-parser.md @@ -1,29 +1,35 @@ -# The Parser +# Lexing and Parsing -The parser is responsible for converting raw Rust source code into a structured -form which is easier for the compiler to work with, usually called an [*Abstract -Syntax Tree*][ast]. An AST mirrors the structure of a Rust program in memory, -using a `Span` to link a particular AST node back to its source text. +> The parser and lexer are currently undergoing a lot of refactoring, so parts +> of this chapter may be out of date. + +The very first thing the compiler does is take the program (in Unicode +characters) and turn it into something the compiler can work with more +conveniently than strings. This happens in two stages: Lexing and Parsing. -The bulk of the parser lives in the [libsyntax] crate. +Lexing takes strings and turns them into streams of tokens. For example, +`a.b + c` would be turned into the tokens `a`, `.`, `b`, `+`, and `c`. +The lexer lives in [`librustc_lexer`][lexer]. -Like most parsers, the parsing process is composed of two main steps, +[lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html -- lexical analysis – turn a stream of characters into a stream of token trees -- parsing – turn the token trees into an AST +Parsing then takes streams of tokens and turns them into a structured +form which is easier for the compiler to work with, usually called an [*Abstract +Syntax Tree*][ast] (AST). An AST mirrors the structure of a Rust program in memory, +using a `Span` to link a particular AST node back to its source text. -The `syntax` crate contains several main players, +The AST is defined in [`libsyntax`][libsyntax], along with some definitions for +tokens and token streams, data structures/traits for mutating ASTs, and shared +definitions for other AST-related parts of the compiler (like the lexer and +macro-expansion). -- a [`SourceMap`] for mapping AST nodes to their source code -- the [ast module] contains types corresponding to each AST node -- a [`StringReader`] for lexing source code into tokens -- the [parser module] and [`Parser`] struct are in charge of actually parsing - tokens into AST nodes, -- and a [visit module] for walking the AST and inspecting or mutating the AST - nodes. +The parser is defined in [`librustc_parse`][librustc_parse], along with a +high-level interface to the lexer and some validation routines that run after +macro expansion. In particular, the [`rustc_parser::parser`][parser] contains +the parser implementation. The main entrypoint to the parser is via the various `parse_*` functions in the -[parser module]. They let you do things like turn a [`SourceFile`][sourcefile] +[parser][parser]. They let you do things like turn a [`SourceFile`][sourcefile] (e.g. the source in a single file) into a token stream, create a parser from the token stream, and then execute the parser to get a `Crate` (the root AST node). @@ -44,14 +50,14 @@ Code for lexical analysis is split between two crates: specific data structures. Specifically, it adds `Span` information to tokens returned by `rustc_lexer` and interns identifiers. - [libsyntax]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/index.html [rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html [ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree [`SourceMap`]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/source_map/struct.SourceMap.html [ast module]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/ast/index.html -[parser module]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/parse/index.html +[librustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html +[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/index.html [`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/parse/parser/struct.Parser.html -[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/parse/lexer/struct.StringReader.html +[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html [visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/visit/index.html [sourcefile]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/source_map/struct.SourceFile.html