Skip to content

Comments parsing is controlled by the scanner #12

@postsolar

Description

@postsolar

Currently comments are handled by the external C scanner rather than directly by the grammar. In PS context there is hardly any reason for it because

  1. Tree-sitter documentation does not mention external scanners being more efficient or anything like this
  2. Comment parsing does not involve layout tracking

In the past it has led to issues due to the applied assumptions about Haskell syntax which are not the case in PureScript, e.g. Maskhjarna/tree-sitter-purescript#7. There's a similar open issue related to multi-line comments:
A document like this

{-# ... -}

fails to be parsed because of the # coming after {-, even though it's not an issue in PureScript. For this specific issue there are 3 strategies of resolving it:

  1. Make the scanner ignore any character after {-
    Pro: simple, quick, atomic
    Con: impact scope is really limited
    Con: I wanted to reduce the dependency on the scanner at some point anyways

  2. Rename scanner's tokens and introduce comment tokens to JS grammar
    Pro: simple, quick, atomic
    Pro: relatively greater scope of impact
    Pro: big modifications to the scanner are delayed
    Con: doesn't work (I already tried) because the scanner would be the first to consume the characters and will just silently succeed, so the comment leaf won't even appear in the tree

  3. Remove comment handling from the scanner and assign it to the JS grammar
    Pro: nice start to reducing the dependency on the scanner
    Pro: opens up opportunities like analyzing inlining directives and documentation comments
    Pro: distinction between comments and multi-line comments
    Pro: more direct, explicit and atomic control of the parsing process, easier for potential contributors
    Con: potentially a hard problem as I have not done much to the scanner before — only a minor tweak twice
    Con: JS grammars are less flexible than external scanners (that's the whole point of scanners), don't have lookahead, expose less conttrol, etc

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions