Skip to content

Conversation

stephencelis
Copy link
Member

This PR significantly changes how backtracking is done in the library, hopefully for the better!

  • All backtracking work has been removed from individual operators (with the exception of Peek and Not)
  • Backtracking has been inserted in between each step of OneOf

Up till now it's been the responsibility of the parser to "be a good citizen" and clean up after itself on failure. This means if a parser failed to do so, you could get into a bad state or make a bad assumption.

By centralizing this work in OneOf, most parsers are freed of this responsibility and no longer have to worry about undoing changes to input. That is now simply be the responsibility of OneOf instead.

@stephencelis stephencelis merged commit 819f9d2 into main Feb 8, 2022
@stephencelis stephencelis deleted the one-of-backtracking branch February 8, 2022 19:19
@tgrapperon
Copy link
Contributor

If I understand correctly, this is a huge change of behavior for end-users of the library. Until now, the input was freely available to reuse somewhere else if parsing did fail, but it can now be partially consumed. Issues can be caught with appropriate tests, but many may trip on this otherwise. Aside a prominent documentation/communication about the change, maybe there's a safer way to handle the transition? For example by adding a .parse() overload or new variant that won't consume the input?

@stephencelis
Copy link
Member Author

@tgrapperon Thanks for the feedback!

If I understand correctly, this is a huge change of behavior for end-users of the library.

We do realize the behavior change is significant, but believe the impact to be small and we will call it out in the release notes.

Of all the parsers we've written we only found one to have a problem with this change, and that's the XCTest log parser in our benchmarks. The problem was that a parser was written in a very strange way that assumed a compactMap would undo its work. The fix itself was an improvement in readability:

https://github.com/pointfreeco/swift-parsing/pull/108/files#diff-27f107d6fe057661c029f42de5d29456e46036fff3acee94ee831d337ec1ec31

Until now, the input was freely available to reuse somewhere else if parsing did fail, but it can now be partially consumed.

While this was true of many of the parsers in the library, there has never been any guarantee that a parser defined outside of the library would undo its work, nor have we documented that parsers undo their work (all documentation around backtracking is additive in this PR). In the end, expecting a parser to undo upon failure isn't something we could ever truly trust it to do. Moving this logic to only OneOf naturally falls out of that.

Aside a prominent documentation/communication about the change, maybe there's a safer way to handle the transition? For example by adding a .parse() overload or new variant that won't consume the input?

We could keep the other backtracking behavior in for now with the plan to remove it in the future, but that feels like we're just kicking the can down the road and will have the same problem whenever we do change the behavior. We'd like to rip the band-aid off early.

Would you like to run any parsers you've written on main to see if anything unexpected happens?

@tgrapperon
Copy link
Contributor

tgrapperon commented Feb 8, 2022

Thanks for the response @stephencelis! I agree. I was just worried by the transition, not the behavior. Fortunately, the library often works on derived types like Substring's or their views, and it is very probable that the user created the input's value specifically for the parser.
Maybe adding a - Note: or - Warning: in the final parser() documentation would help the information to stand out.

I have a case where I'm trying to parse a String with a parser, and if fails, I send it in another function that is using ML and a more fuzzy approach. But I just checked and the "Substring protection" did kick-in here, as the ML function accepts a String argument (and the parser even works with utf8 views).

I wrote very few parsers with an explicit parse function myself, and I would always backtrack completely if some internal parser did failed in this case, so I guess there won't be behavioral issues. My concern was really about top-level uses, but it turns out I was probably over-worried!

This is a good change overall, and I can't wait for what's coming in the next weeks/months!

@mackoj
Copy link

mackoj commented Feb 8, 2022

I did test it on our project and it did perform as expected 👍🏽

@randomeizer
Copy link
Contributor

randomeizer commented Feb 8, 2022

In general I think this is fine. One additional "good citizen" might be Many, particularly if it fails to match the terminator, but I think it is already built that way.

@randomeizer
Copy link
Contributor

randomeizer commented Feb 8, 2022

Might be worth adding relevant documentation about Many (and Not/Peek?) also backtracking? Particularly if it affects performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants