Skip to content

Add WithEnvironment mixin for visitors #157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

vinistock
Copy link
Contributor

@vinistock vinistock commented Sep 14, 2022

Add a mixin to automatically keep track of local variables and arguments defined in the current environment for visitors. This module is very useful when trying to determine all occurrences of a particular local or its type, given a specific scope in the code.

Implementation

The implementation has two parts: the Environment and the WithEnvironment module.

  • The Environment class keeps track of what locals exist in a given environment. It also keeps a reference to an optional parent environment, since we need to go up the chain to determine whether a local is defined or not
  • The WithEnvironment module can be included in classes inheriting from Visitor to override some visits methods and automatically keep track of variables

Example usage

class MyVisitor < SyntaxTree::Visitor
  include WithEnvironment

  def visit_var_ref(node)
    local = current_environment.find_local(node.value)
    
    if local.type == :argument
      # handle a local argument
    else
      # handle a local variable
    end
  end
end

Questions

Did we cover all possible local variable occurrences? Maybe we're missing pattern matching with variable pinning.

Notes

We explored populating the environment during parsing, which is definitely possible. However, it makes accessing the current environment quite tricky, as parsing is completely decoupled from visits.

@vinistock vinistock force-pushed the vs/add_with_environment_mixin branch from a1d5501 to dca699a Compare September 14, 2022 16:59
Copy link
Member

@kddnewton kddnewton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also need to handle regex captures. Pattern matching breaks down to var_field so we already have that.

On top of that, we'll need tons of testing.

@vinistock vinistock force-pushed the vs/add_with_environment_mixin branch 3 times, most recently from f6e0c2e to c273f33 Compare September 15, 2022 21:04
@vinistock
Copy link
Contributor Author

I added a few test scenarios. Also, the previous style using alias couldn't work because then super is not able to find the parent class method, so I switched to just being explicit.

I tried adding support for the named captures in a regex, but we only get a regexp_literal node with a string node inside. Do you think we should try to extract the names and build a location for the captures?

@vinistock vinistock force-pushed the vs/add_with_environment_mixin branch from c273f33 to c9dca3a Compare September 15, 2022 21:11
attr_reader :type

# [Array[Location]] The locations of all occurrences of this local,
# including its definition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to track the definition of a local separately, so that tools can use that information? For example, that would allow Ruby LSP to implement Go to Definition for local variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense. It would always be the first location in the array. Do you think we should store it separately or just provide a method that returns the first location? Something like

def definition_location
  @locations.first
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it wouldn't always be the first necessarily.

a = 1 if a

In this case the var_ref is found before the var_field.

I think I'd definitely like to track the definition and usages separately. They're different types usually anyway, since all of the definitions will be *Field nodes, and the usages will be VarRef nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. What would you expect for locals that are assigned multiple times or conditionally? E.g.:

# Which one of these are stored as the definition location? Both?
if something
  a = 1
else
  a = 2
end
# ...

# How about this one? Do we also include it as a definition location?
a = 5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof that's hard. I would say it would be an array of the first and second, but that's going to get into some pretty in-depth flow analysis that maybe we shouldn't do. How about just separating the assignments from the references?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense, especially as a first iteration. Maybe in the future we can take a stab at more complex analysis.

The one thing that can be slightly odd is for arguments. Should we include their definitions together with re-assignments? It might be fine as long as we document it and name the array something like definitions and not assignments.

def foo(a) # This would be a definition
  something(a) # This would be a usage
  a = 123 # This would be another "definition"
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah definitions makes a lot of sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split usages and definitions in the latest commit.

I also refactored the tests a little bit to check what is saved in the environment, rather than saving the nodes. This uncovered a couple of bugs that I fixed.

The one thing that it uncovered and I'm not sure what we could do is that rest variables in pinning are considered VCall, not a VarRef.

I think we had spoken about this a while back and you mentioned this comes from Ripper itself, so we may just have to leave it as is for now.

def foo
  case [1, 2]
  in ^a, *rest # => this is a var_field and gets put in definitions
    puts rest # => this is considered a vcall and is ignored
  end
end

@vinistock vinistock force-pushed the vs/add_with_environment_mixin branch from c9dca3a to 2eccaec Compare September 16, 2022 17:15
@vinistock vinistock marked this pull request as ready for review September 21, 2022 15:46
Copy link
Contributor

@paracycle paracycle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation looks good to me, but at the expense of coming in last minute with a design decision, I would like to ask: Is this best exposed as an optional visitor level feature?

Here is my thinking: If this is a feature that will collect locals per visitor, then for an LSP implementation, each visitor for each request would collect the locals all over again separately. This means wasted work for visitors that keep working on the same tree.

Instead if it was possible to decorate the syntax tree returned from the parse operation (either via calling a parse_with_locals method or by passing the parse result through another operation), then each visitor can operate over the collected locals without doing extra work. The LSP server could even cache that tree with locals between changes.

@vinistock and @kddnewton: Do think this this makes sense and is feasible?

@vinistock
Copy link
Contributor Author

We explored subclassing the parser to create a ParserWithEnvironment. The main issues we identified were

  • What structure do you return for the environment? Something that keeps every existing environment around?
  • And how to you find the specific one you're interesting when visiting something in a request? That is, which structure would allow us to do something like this
class MyLspRequest < BaseRequest
  def visit_var_ref(node)
    # With the current implementation, the current_environment is always the relevant one
    local = current_environment.find_local(node.value)

    # If the parser builds the environment structure, how do you access it from here?
    # We thought about keeping the information inside the node classes. Something like `node.occurrences`
    # Alex pointed out that this may be brittle, especially if the AST is transformed in any way
  end
end

It would be sweet to do this in the parser, but we couldn't figure out a nice way of accessing the right environment.

@kddnewton
Copy link
Member

It would be sweet to do this in the parser, but we couldn't figure out a nice way of accessing the right environment.

Yeah it would definitely be hard. Since the SAX-style ripper is bottom-up, you'd probably have to keep all assignments and references around while the nodes are being built until you get to various breakpoints. So for example, if you had:

class Foo
  def foo
    a = 1
    a
  end
end

you'd need to:

init --> create an empty array of assignments and references
kw "class" --> push onto stack
const "Foo"
kw "def" --> push onto stack
ident "foo"
ident "a", op "=", int "1", var_field, assign, stmts_new, stmts_add --> add assignment node to assignments
ident "a", var_ref, stmts_add --> add var_ref node to references
kw "end" --> push onto stack
def --> pop kw "def" and kw "end" off stack, use bounds to determine which assignments and references are contained within the bounds of the def node, use those to extract them from the list and create a new scope
class --> pop kw "class" and kw "end"` off stack, use bounds to do the same thing (in this example nothing would be returned)

Note that the pushing and popping from the stack is already done, since that's how location is determined. Also, referencing a list of things is also done, since that's how non-inline comments are attached to blocks of statements. So it's not impossible, but it's also not necessarily easy.

Let me know what you'd like to do. I'd be okay keeping it in its current form, but if you'd like to attempt doing it with the parser, I do think that'd be nice.

@kddnewton
Copy link
Member

Also, regardless of the way we go, can you include some documentation for this in the README?

@vinistock vinistock force-pushed the vs/add_with_environment_mixin branch 2 times, most recently from f6d02c6 to 7030af0 Compare October 17, 2022 17:11
Co-authored-by: Alexandre Terrasa <[email protected]>
Co-authored-by: Stan Lo <[email protected]>
Co-authored-by: Emily Samp <[email protected]>
Co-authored-by: Kaan Ozkan <[email protected]>
Co-authored-by: Adison Lampert <[email protected]>
Co-authored-by: Dirceu Tiegs <[email protected]>
@vinistock vinistock force-pushed the vs/add_with_environment_mixin branch from 7030af0 to 7003178 Compare October 17, 2022 17:44
@vinistock
Copy link
Contributor Author

Added some documentation to the README. We attempted to get the implementation of this working in the parser, but it was a bit messier. I'd say, we go with this implementation for now and if we manage to get it working nicely with the parser, we can switch.

@vinistock vinistock requested a review from kddnewton October 17, 2022 17:45
@kddnewton kddnewton merged commit 002cb71 into ruby-syntax-tree:main Oct 17, 2022
@vinistock vinistock deleted the vs/add_with_environment_mixin branch October 17, 2022 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants