Skip to content

Documentation and new parsers rest,take,eof #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,19 @@ Breaking changes:

New features:

- `Parser.String.rest` (#140 by @jamesdbrock)
- `Parser.String.takeN` (#140 by @jamesdbrock)
- `Parser.Token.eof` (#140 by @jamesdbrock)

Bugfixes:

- `Parser.String.eof` Set consumed on success so that this parser combines
correctly with `notFollowedBy eof`. Added a test for this. (#140 by @jamesdbrock)

Other improvements:

- Documentation. (#140 by @jamesdbrock)

## [v8.1.0](https://github.com/purescript-contrib/purescript-parsing/releases/tag/v8.1.0) - 2022-01-10

Other improvements: README Quick start monadic parsing tutorial
Expand Down
57 changes: 41 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
[![Maintainer: jamesdbrock](https://img.shields.io/badge/maintainer-jamesdbrock-teal.svg)](https://github.com/jamesdbrock)
[![Maintainer: robertdp](https://img.shields.io/badge/maintainer-robertdp-teal.svg)](https://github.com/robertdp)

A monadic parser combinator library based on Haskell's [Parsec](https://hackage.haskell.org/package/parsec).
A monadic parser combinator library based on Haskell’s
[Parsec](https://hackage.haskell.org/package/parsec).

## Installation

Expand All @@ -22,26 +23,41 @@ Here is a basic tutorial introduction to monadic parsing with this package.

### Parsers

A parser turns a string into a data structure. Parsers in this library have the type `Parser s a`, where `s` is the type of the input string, and `a` is the type of the data which the parser will produce on success. `Parser s a` is a monad. It’s defined in the module `Text.Parsing.Parser`.
A parser turns a string into a data structure. Parsers in this library have the type `Parser s a`, where `s` is the type of the input string, and `a` is the type of the data which the parser will produce on success. `Parser s` is a monad. It’s defined in the module `Text.Parsing.Parser`.

Monads can be used to provide context for a computation, and that’s how we use them in monadic parsing. The context provided by the `Parser` monad is *the parser’s current location in the input string*. Parsing starts at the beginning of the input string.
Monads can be used to provide context for a computation, and that’s how we use them in monadic parsing.
The context provided by the `Parser s` monad is __the parser’s current location in the input string__.
Parsing starts at the beginning of the input string.

Parsing requires two more capabilities: *choice* and *failure*.
Parsing requires two more capabilities: __alternative__ and __failure__.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "choice" -> "alternative" change makes the language fairly awkward. What was the reason for changing it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because “alternative” is the term used by the Alt typeclass. For clarity I like to try to always use the same word when I'm talking about the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "choice" is a more familiar word to use here because you clarify what you mean by "choice" in the next few paragraphs: Alt.


We need *choice* to be able to make decisions about what kind of thing we’re parsing depending on the input which we encouter. This is provided by the `Alt` typeclass instance of the `Parser` monad, particularly the `<|>` operator. That operator will first try the left parser and if that fails, then it will backtrack the input string and try the right parser.
We need __alternative__ to be able to choose what kind of thing we’re parsing depending
on the input which we encounter. This is provided by the `<|>` “alt”
operator of the `Alt` typeclass instance of the `Parser s` monad.
The expression `p_left <|> p_right` will first try the `p_left` parser and if that fails
__and consumes no input__ then it will try the `p_right` parser.

We need *failure* in case the input stream is not parseable. This is provided by the `fail` function, which calls the `throwError` function of the `MonadThrow` typeclass instance of the `Parser` monad. The result of running a parser has type `Either ParseError a`, so if the parse succeeds then the result is `Right a` and if the parse fails then the result is `Left ParseError`.
We need __failure__ in case the input stream is not parseable. This is provided by the `fail`
function, which calls the `throwError` function of the `MonadThrow` typeclass instance of
the `Parser s` monad.


### Running a parser

To run a parser, call the function `runParser :: s -> Parser s a -> Either ParseError a` in the `Text.Parsing.Parser` module, and supply it with an input string and a parser.
To run a parser, call the function `runParser :: s -> Parser s a -> Either ParseError a` in
the `Text.Parsing.Parser` module, and supply it with an input string and a parser.
If the parse succeeds then the result is `Right a` and if the parse fails then the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come there are all these extra line breaks in this file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for editing convenience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm curious why this was done too because it messes up the git blame. What does "editing convenience" mean?

result is `Left ParseError`.

### Primitive parsers

Each type of input string needs primitive parsers. Primitive parsers for input string type `String` are in the `Text.Parsing.Parser.String` module. We can use these primitive parsers to write other `String` parsers.
Each type of input string needs primitive parsers.
Primitive parsers for input string type `String` are in the `Text.Parsing.Parser.String` module.
We can use these primitive parsers to write other `String` parsers.

Here is a parser `ayebee :: Parser String Boolean` which will accept only two input strings: `"ab"` or `"aB"`. It will return `true` if the `b` character is uppercase. It will return `false` if the `b` character is lowercase. It will fail with a `ParseError` if the input string is anything else. This parser is written in terms of the primitive parser `char :: Parser String Char`.
Here is a parser `ayebee :: Parser String Boolean` which will accept only two input
strings: `"ab"` or `"aB"`.
It will return `true` if the `b` character is uppercase.
It will return `false` if the `b` character is lowercase.
It will fail with a `ParseError` if the input string is anything else.
This parser is written in terms of the primitive parser `char :: Parser String Char`.

```purescript
ayebee :: Parser String Boolean
Expand All @@ -61,24 +77,33 @@ and then the parser will succeed and return `Right true`.

#### [✨ Run the `ayebee` parser in your browser on *Try PureScript!*](https://try.purescript.org/?github=/purescript-contrib/purescript-parsing/main/docs/examples/QuickStart.purs)

When you write a real parser you will usually want to return a more complicated data structure than a single `Boolean`. See [*Parse, don't validate*](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/).

### More parsers

There are other `String` parsers in the module `Text.Parsing.Parser.Token`, for example the parser `letter :: Parser String Char` which will accept any single alphabetic letter.

### Parser combinators

A parser combinator is a function which takes a parser as an argument and returns a new parser. The `many` combinator, for example, will repeat a parser as many times as it can. So the parser `many letter` will have type `Parser String (Array Char)`. Parser combinators are in this package in the module `Text.Parsing.Parser.Combinators`.
A parser combinator is a function which takes a parser as an argument and returns a new parser. The `many` combinator, for example, will repeat a parser as many times as it can. So the parser `many letter` will have type `Parser String (Array Boolean)`. Running that parser

```purescript
runParser "aBabaB" (many ayebee)
```

will return `Right [true, false, true]`.

Parser combinators are in this package in the module `Text.Parsing.Parser.Combinators`.

## Further reading

Here is the original short classic [FUNCTIONAL PEARLS *Monadic Parsing in Haskell*](https://www.cs.nott.ac.uk/~pszgmh/pearl.pdf) by Graham Hutton and Erik Meijer.
Here is the original short classic [FUNCTIONAL PEARLS *Monadic Parsing in Haskell*](https://www.cs.nott.ac.uk/~pszgmh/pearl.pdf) by Graham Hutton and Erik Meijer.

[*Revisiting Monadic Parsing in Haskell*](https://vaibhavsagar.com/blog/2018/02/04/revisiting-monadic-parsing-haskell/) by Vaibhav Sagar is a reflection on the Hutton, Meijer FUNCTIONAL PEARL.

[*Parse, don't validate*](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/) by Alexis King is about what it means to “parse” something, without any mention of monads.

[*Parsec: “try a <|> b” considered harmful*](http://blog.ezyang.com/2014/05/parsec-try-a-or-b-considered-harmful/) by Edward Z. Yang is about how to decide when to backtrack
from a failed alternative.

There are lots of other great monadic parsing tutorials on the internet.

## Related Packages
Expand Down
30 changes: 27 additions & 3 deletions src/Text/Parsing/Parser.purs
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,18 @@ derive instance ordParseError :: Ord ParseError

-- | Contains the remaining input and current position.
data ParseState s = ParseState s Position Boolean
-- ParseState constructor has three parameters,
-- s: the remaining input
-- Position: the current position
-- Boolean: the consumed flag.
--
-- The consumed flag is used to implement the rule for `alt` that
-- * If the left parser fails *without consuming any input*, then backtrack and try the right parser.
-- * If the left parser fails and consumes input, then fail immediately.
--
-- https://hackage.haskell.org/package/parsec/docs/Text-Parsec.html#v:try
--
-- http://blog.ezyang.com/2014/05/parsec-try-a-or-b-considered-harmful/

-- | The Parser monad transformer.
-- |
Expand Down Expand Up @@ -105,12 +117,25 @@ derive newtype instance monadStateParserT :: Monad m => MonadState (ParseState s
derive newtype instance monadThrowParserT :: Monad m => MonadThrow ParseError (ParserT s m)
derive newtype instance monadErrorParserT :: Monad m => MonadError ParseError (ParserT s m)

-- | The alternative `Alt` instance provides the `alt` combinator `<|>`.
-- |
-- | The expression `p_left <|> p_right` will first try the `p_left` parser and if that fails
-- | __and consumes no input__ then it will try the `p_right` parser.
-- |
-- | While we are parsing down the `p_left` branch we may reach a point where
-- | we know this is the correct branch, but we cannot parse further. At
-- | that point we want to fail the entire parse instead of trying the `p_right`
-- | branch. To control the point at which we commit to the `p_left` branch
-- | use the `try` combinator.
-- |
-- | The `alt` combinator works this way because it gives us good localized
-- | error messages while also allowing an efficient implementation.
instance altParserT :: Monad m => Alt (ParserT s m) where
alt p1 p2 = (ParserT <<< ExceptT <<< StateT) \(s@(ParseState i p _)) -> do
Tuple e s'@(ParseState _ _ c') <- runStateT (runExceptT (unwrap p1)) (ParseState i p false)
Tuple e s'@(ParseState _ _ consumed) <- runStateT (runExceptT (unwrap p1)) (ParseState i p false)
case e of
Left _
| not c' -> runStateT (runExceptT (unwrap p2)) s
| not consumed -> runStateT (runExceptT (unwrap p2)) s
_ -> pure (Tuple e s')

instance plusParserT :: Monad m => Plus (ParserT s m) where
Expand Down Expand Up @@ -147,4 +172,3 @@ failWithPosition message pos = throwError (ParseError message pos)
-- | `region` as the parser backs out the call stack.
region :: forall m s a. Monad m => (ParseError -> ParseError) -> ParserT s m a -> ParserT s m a
region context p = catchError p $ \err -> throwError $ context err

Loading