Skip to content

Get position of first non-matching character #941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
casey opened this issue Jan 5, 2023 · 2 comments
Closed

Get position of first non-matching character #941

casey opened this issue Jan 5, 2023 · 2 comments

Comments

@casey
Copy link

casey commented Jan 5, 2023

I have a project with a bunch of integration tests which make a requests to a HTTP server we're writing, and then assert that the response matches a regex.

In case of a match, the test continues. In case of a mismatch, the test fails with an error message. The error messages print the regex and then the text, and it's often quite hard to figure out where in the text the regex wasn't able to match.

If we could get the index of the first non-maching character, or the index of the last matching character, then we could point out that position in the response body, which would improve our error messages a great deal.

Would it be possible to provide something like this?

impl Regex {
   fn first_mismatch_position(&self, str: &str) -> Option<usize> {
    todo!()
   }
}
@BurntSushi
Copy link
Member

This is basically a duplicate of #678 as far as I can tell. Worded a little differently, but seems to be requesting the same kind of information.

The answer to your question "is it possible" is indeed "yes." But I expect the real question is not just "is it possible," but also, "and if so, could it be added to the public API?" The answer to the first question is yes, but the answer to the second is no. Something like this has to come directly from the underlying search implementation, and that information is simply not available in all the different regex engines, or at least might be different between them and inconsistent. (Perhaps that's not a showstopper.) More to the point, this would require huge changes in all of the internal APIs and likely additional costs as well.

For something super niche like this, I would recommend using regex-automata once #656 lands. In that case, you'll be able to directly control the automaton and write your own search routines and report whatever information you like.

The #678 issue has more discussion including more API design details. If you have follow up questions or what not, I'd prefer discussion be focused there. Thanks.

@BurntSushi BurntSushi closed this as not planned Won't fix, can't repro, duplicate, stale Jan 5, 2023
@casey
Copy link
Author

casey commented Jan 5, 2023

Thanks for the explanation, that makes sense! I'll follow the other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants