Skip to content

Commit f127991

Browse files
committed
Rewrite the grammar once again.
* Parses the GHC codebase! I'm using a trimmed set of the source directories of the compiler and most core libraries in [this repo](https://github.com/tek/tsh-test-ghc). This used to break horribly in many files because explicit brace layouts weren't supported very well. * Faster in most cases! Here are a few simple benchmarks to illustrate the difference, not to be taken _too_ seriously, using the test codebases in `test/libs`: Old: ``` effects: 32ms postgrest: 91ms ivory: 224ms polysemy: 84ms semantic: 1336ms haskell-language-server: 532ms flatparse: 45ms ``` New: ``` effects: 29ms postgrest: 64ms ivory: 178ms polysemy: 70ms semantic: 692ms haskell-language-server: 390ms flatparse: 36ms ``` GHC's `compiler` directory takes 3000ms, but is among the fastest repos for per-line and per-character times! To get more detailed info (including new codebases I added, consisting mostly of core libraries), run `test/parse-libs`. I also added an interface for running `hyperfine`, exposed as a Nix app – execute `nix run .#bench-libs -- stm mtl transformers` with the desired set of libraries in `test/libs` or `test/libs/tsh-test-ghc/libraries`. * Smaller size of the shared object. `tree-sitter generate` produces a `haskell.so` with a size of 4.4MB for the old grammar, and 3.0MB for the new one. * Significantly faster time to generate, and slightly faster build. On my machine, generation takes 9.34s vs 2.85s, and compiling takes 3.75s vs 3.33s. * All terminals now have proper text nodes when possible, like the `.` in modules. Fixes #102, #107, #115 (partially?). * Semicolons are now forced after newlines even if the current parse state doesn't allow them, to fail alternative interpretations in GLR conflicts that sometimes produced top-level expression splices for valid (and invalid) code. Fixes #89, #105, #111. * Comments aren't pulled into preceding layouts anymore. Fixes #82, #109. (Can probably still be improved with a few heuristics for e.g. postfix haddock) * Similarly, whitespace is kept out of layout-related nodes as much as possible. Fixes #74. * Hashes can now be operators in all situations, without sacrificing unboxed tuples. Fixes #108. * Expression quotes are now handled separately from quasiquotes and their contents parsed properly. Fixes #116. * Explicit brace layouts are now handled correctly. Fixes #92. * Function application with multiple block arguments is handled correctly. * Unicode categories for identifiers now match GHC, and the full unicode character set is supported for things like prefix operator detection. * Haddock comments have dedicated nodes now. * Use named precedences instead of closely replicating the GHC parser's productions. * Different layouts are tracked and closed with their special cases considered. In particular, multi-way if now has layout. * Fixed CPP bug where mid-line `#endif` would be false positive. * CPP only matches legal directives now. * Generally more lenient parsing than GHC, and in the presence of errors: * Missing closing tokens at EOF are tolerated for: * CPP * Comment * TH Quotation * Multiple semicolons in some positions like `if/then` * Unboxed tuples and sums are allowed to have arbitrary numbers of filled positions * List comprehensions can have multiple sets of qualifiers (`ParallelListComp`). * Deriving clauses after GADTs don't require layout anymore. * Newtype instance heads are working properly now. * Escaping newlines in comments and cpp works now. Escaping newlines on regular lines won't be implemented. * One remaining issue is that qualified left sections that contain infix ops are broken: `(a + a A.+)` I haven't managed to figure out a good strategy for this – my suspicion is that it's impossible to correctly parse application, infix and negation without lexing all qualified names in the scanner. I will try that out at some point, but for now I'm planning to just accept that this one thing doesn't work. For what it's worth, none of the codebases I use for testing contain this construct in a way that breaks parsing. * Repo now includes a Haskell program that generates C code for classifying characters as belonging to some sets of Unicode categories, using bitmaps. I might need to change this to write them all to a shared file, so the set of source files stays the same.
1 parent 95a4f00 commit f127991

File tree

168 files changed

+104087
-945998
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

168 files changed

+104087
-945998
lines changed

.gitattributes

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,5 @@
11
/src/** linguist-vendored
22
/examples/* linguist-vendored
3+
/src/parser.c -diff
4+
/src/grammar.json -diff
5+
/src/node-types.json -diff

.github/workflows/assets.yml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
name: Publish assets
2+
3+
on:
4+
workflow_run:
5+
workflows: [CI]
6+
types: [completed]
7+
branches: [main]
8+
9+
jobs:
10+
build:
11+
runs-on: ubuntu-latest
12+
if: github.event.workflow_run.conclusion == 'success'
13+
permissions:
14+
contents: read
15+
id-token: write
16+
steps:
17+
- uses: actions/checkout@v4
18+
- uses: DeterminateSystems/nix-installer-action@main
19+
- uses: DeterminateSystems/magic-nix-cache-action@main
20+
21+
- run: nix -L build .#parser-src
22+
- name: Upload parser sources
23+
uses: actions/upload-artifact@v4
24+
with:
25+
name: tree-sitter-haskell-src
26+
path: result/src
27+
28+
- run: nix -L build .#parser-wasm
29+
- name: Upload wasm binary
30+
uses: actions/upload-artifact@v4
31+
with:
32+
name: tree-sitter-haskell-wasm
33+
path: result/tree-sitter-haskell.wasm

.github/workflows/ci.yml

Lines changed: 34 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,16 @@ name: CI
22

33
on:
44
push:
5-
branches:
6-
- "**"
5+
branches: [main]
6+
tags: ['**']
77
pull_request:
8-
types:
9-
- opened
10-
- synchronize
8+
types: [opened, synchronize]
119

1210
jobs:
1311
test:
14-
name: test / ${{ matrix.os }}
15-
runs-on: ${{ matrix.os }}
12+
name: test / ${{matrix.os}}
13+
runs-on: ${{matrix.os}}
14+
if: github.event.pull_request.merged == true || github.event.action != 'closed'
1615
strategy:
1716
fail-fast: false
1817
matrix:
@@ -27,20 +26,39 @@ jobs:
2726
with:
2827
node-version: '18'
2928

30-
# - name: Install emscripten
31-
# uses: mymindstorm/setup-emsdk@v10
32-
# with:
33-
# version: '2.0.24'
29+
- name: Install emscripten
30+
uses: mymindstorm/setup-emsdk@v14
31+
with:
32+
version: '3.1.47'
3433

35-
- name: Build tree-sitter-haskell
34+
- name: Build dependencies
3635
run: npm install
3736

3837
- name: Run tests
3938
run: npm test
4039

41-
- name: Parse examples
42-
run: npm run examples
40+
- name: Parse libraries
41+
run: npm run libs
42+
43+
- name: Parse libraries with wasm
44+
run: npm run libs-wasm
45+
46+
- name: Run fuzzer
47+
if: ${{matrix.os == 'ubuntu-latest'}}
48+
uses: tree-sitter/fuzz-action@v4
4349

44-
# - name: Parse examples with web binding
45-
# run: npm run examples-wasm
50+
legacy:
51+
permissions:
52+
contents: write
53+
id-token: write
54+
needs: test
55+
if: github.ref_type == 'tag'
56+
uses: ./.github/workflows/legacy.yml
4657

58+
release:
59+
permissions:
60+
contents: read
61+
id-token: write
62+
needs: test
63+
if: github.ref_type == 'tag'
64+
uses: ./.github/workflows/release.yml

.github/workflows/legacy.yml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: Update legacy branch
2+
3+
on:
4+
workflow_call:
5+
6+
jobs:
7+
commit:
8+
runs-on: ubuntu-latest
9+
permissions:
10+
contents: write
11+
id-token: write
12+
steps:
13+
- uses: actions/checkout@v4
14+
with:
15+
ref: ${{github.ref}}
16+
- uses: actions/checkout@v4
17+
with:
18+
ref: master
19+
20+
- name: Reset worktree to ${{github.ref_name}}
21+
run: |
22+
git restore --source=${{github.ref}} .
23+
git restore .gitignore
24+
25+
- uses: DeterminateSystems/nix-installer-action@main
26+
- uses: DeterminateSystems/magic-nix-cache-action@main
27+
28+
- name: Generate parser
29+
run: nix -L run .#gen-parser
30+
31+
- name: Commit and push to legacy branch
32+
uses: actions-js/[email protected]
33+
with:
34+
github_token: ${{secrets.GITHUB_TOKEN}}
35+
message: "Legacy release ${{github.ref_name}}"
36+
branch: master

.github/workflows/release.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
name: Publish package
22

33
on:
4-
push:
5-
tags: ["*"]
4+
workflow_call:
65

76
concurrency:
87
group: ${{github.workflow}}-${{github.ref}}

.gitignore

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
1-
node_modules
2-
build
1+
/src/grammar.json
2+
/src/node-types.json
3+
/src/parser.c
4+
/dist-newstyle
5+
/result
6+
/test/libs/*
7+
!/test/libs/.gitkeep
8+
/build/
9+
/target/
10+
/.lib/
11+
/node_modules/
312
*.log
4-
package-lock.json
5-
repos
6-
examples/*
7-
!examples/.gitkeep
813
.gdb_history
914
*.o
1015
*.so
11-
/.build/

.npmignore

Lines changed: 0 additions & 6 deletions
This file was deleted.

Cargo.lock

Lines changed: 13 additions & 25 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
[package]
22
name = "tree-sitter-haskell"
33
description = "haskell grammar for the tree-sitter parsing library"
4-
version = "0.15.0"
4+
version = "0.20.6"
55
keywords = ["incremental", "parsing", "haskell"]
66
categories = ["parsing", "text-editors"]
77
repository = "https://github.com/tree-sitter/tree-sitter-haskell"
8-
edition = "2018"
9-
license = "MIT"
8+
edition = "2021"
109

1110
build = "bindings/rust/build.rs"
1211
include = [
@@ -19,6 +18,24 @@ include = [
1918
[lib]
2019
path = "bindings/rust/lib.rs"
2120

21+
[[test]]
22+
name = "parse-test"
23+
path = "test/rust/parse-test.rs"
24+
25+
[[bin]]
26+
name = "parse"
27+
path = "test/rust/parse.rs"
28+
test = false
29+
bench = false
30+
doc = false
31+
32+
[[bin]]
33+
name = "show"
34+
path = "test/rust/show.rs"
35+
test = false
36+
bench = false
37+
doc = false
38+
2239
[dependencies]
2340
tree-sitter = "0.20"
2441

Makefile

Lines changed: 0 additions & 47 deletions
This file was deleted.

0 commit comments

Comments
 (0)