Skip to content

Design: syntax trees vs identity #3129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matklad opened this issue Feb 13, 2020 · 3 comments
Closed

Design: syntax trees vs identity #3129

matklad opened this issue Feb 13, 2020 · 3 comments
Labels
C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) E-unknown It's unclear if the issue is E-hard or E-easy without digging in fun A technically challenging issue with high impact

Comments

@matklad
Copy link
Member

matklad commented Feb 13, 2020

In #3120 (comment), I've once again noticed that working with syntax trees when you want to get semantic information is annoying. This issue is just a brain dump of the things we might do here.

The problem we have is that it's difficult to "index" semantic info by current syntax trees.

For example, here's how one gets a type of expression in Roslyn (with some details ommited):

            SyntaxTree tree = CSharpSyntaxTree.ParseText("some code");

            var model =  getSemanticModel(tree);

            var helloWorldString = root.DescendantNodes()
                                       .OfType<LiteralExpressionSyntax>()
                                       .First();

            var type = model.GetTypeInfo(helloWorldString).Type; // *

Note how at * we only pass the tree to the model to get the type.

In contrast, with the current API in Rust syntax trees are value types which don't remember the context they originated from. So, to get a type of an expression syntax, you also need to specify the identity of the file this syntax originates from.

The current way to specify identity is mostly the InFile<T> type, which is a pair of T and a (real or macro) file id. There are two problems with this approach:

  • It does not actually yield a true global identity, because the same file can be included twice into a module tree of a crate (Note how this makes syntax<->semantics not a bijection, which is painful to deal with).
  • The API is awkward because, if you have s: InFile<ast::StructDef>, you can't get an Option<InFile<ast::Name>> by calling an s.name(), instead you need to s.map(|it| it.name()).transpose(), which is a lot.

It also has it's benefits:

  • It is composable, you can have InFile<ast::SomeNod>, InFile<TextRange>, InFile<Token>.
  • It removes identity from the tree itself, making the boundaries between the components clearer.

What can we do here:

  • Maintain the status quo, requiring all the IDE code to explicitly mange, map and transpose InFile<T>'s.
  • Explicitely mirror AST API for InFile<T>. Ie, generate something like impl InFile<ast::StructDef> { fn name(&self) -> Option<InFile<ast::StructDef>> }
  • Add FIleId(u32) field to each syntax node, such that nodes always come with a strong identity.
@matklad matklad added E-unknown It's unclear if the issue is E-hard or E-easy without digging in fun A technically challenging issue with high impact labels Feb 13, 2020
@matklad
Copy link
Member Author

matklad commented Feb 13, 2020

@matklad
Copy link
Member Author

matklad commented Feb 13, 2020

Found a seemingly cute solution: https://rust-lang.zulipchat.com/#narrow/stream/185405-t-compiler.2Fwg-rls-2.2E0/topic/Design.20question.3A.20identity.20in.20syntax.20trees/near/188124603

We add an identity to the tree, but this identity doesn't have any semantics attached.

@matklad matklad added the C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) label Feb 17, 2020
@matklad
Copy link
Member Author

matklad commented Mar 2, 2020

For the time being this is closed by #3222.

@matklad matklad closed this as completed Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) E-unknown It's unclear if the issue is E-hard or E-easy without digging in fun A technically challenging issue with high impact
Projects
None yet
Development

No branches or pull requests

1 participant