-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Currently comments are handled by the external C scanner rather than directly by the grammar. In PS context there is hardly any reason for it because
- Tree-sitter documentation does not mention external scanners being more efficient or anything like this
- Comment parsing does not involve layout tracking
In the past it has led to issues due to the applied assumptions about Haskell syntax which are not the case in PureScript, e.g. Maskhjarna/tree-sitter-purescript#7. There's a similar open issue related to multi-line comments:
A document like this
{-# ... -}fails to be parsed because of the # coming after {-, even though it's not an issue in PureScript. For this specific issue there are 3 strategies of resolving it:
-
Make the scanner ignore any character after
{-
Pro: simple, quick, atomic
Con: impact scope is really limited
Con: I wanted to reduce the dependency on the scanner at some point anyways -
Rename scanner's tokens and introduce comment tokens to JS grammar
Pro: simple, quick, atomic
Pro: relatively greater scope of impact
Pro: big modifications to the scanner are delayed
Con: doesn't work (I already tried) because the scanner would be the first to consume the characters and will just silently succeed, so the comment leaf won't even appear in the tree -
Remove comment handling from the scanner and assign it to the JS grammar
Pro: nice start to reducing the dependency on the scanner
Pro: opens up opportunities like analyzing inlining directives and documentation comments
Pro: distinction between comments and multi-line comments
Pro: more direct, explicit and atomic control of the parsing process, easier for potential contributors
Con: potentially a hard problem as I have not done much to the scanner before — only a minor tweak twice
Con: JS grammars are less flexible than external scanners (that's the whole point of scanners), don't have lookahead, expose less conttrol, etc