Description
This issue is intended to brainstorm some ideas on how to report regex parse errors to Rust programmers.
One of my long running projects has been a rewrite of the regex-syntax
crate. There are several goals I'm achieving via this rewrite, and one of them in particular is better error messages for regex parse errors.
Here's an example:
Here's another example:
The above error messages correspond to the fmt::Display
implementation on the error returned by the parser. What I'd like to have happen is for these error messages to appear to programmers when they create malformed regexes in their source code. Specifically, today, the code pattern for building a regex that is known at compile time is:
let re = Regex::new(...).unwrap();
The issue here is that if the regex contains a parse error, then the unwrap
implementation will show the Debug
message for the error instead of the Display
implementation. That's definitely the right behavior in most cases, but in this case, I'd really like to show a nicer error message. My question is: how can I achieve that?
I can think of two ways:
- Make the
Debug
implementation defer to theDisplay
implementation. This causesunwrap
to show the right error message, but now we don't have a "normal"Debug
implementation, which feels kind of wrong to me. - Create a new constructor on regex called
must
(name subject to bikeshedding) that is basically the same asRegex::new(...).unwrap()
, except it will emit theDisplay
impl for the error instead of theDebug
impl. The downside here is that users of theregex
crate need to usemust
, and this particular style of handling panics isn't really idiomatic in the Rust ecosystem, where we instead prefer an explicitunwrap
. The key difference here is that regexes in the source code with parse errors are inherently programmer errors, and programmers should get nice regex error messages, because theDebug
impl isn't going to be particularly helpful.
Are there other ways of approaching this problem? What do other people think? My inclination right now is hovering around (2), particularly since Regex::new(...).unwrap()
is not only extraordinarily common, but is also correct in the vast majority of cases (with it being incorrect only when the regex isn't known until runtime). That is, normalizing that code pattern with an explicit constructor feels like a good thing to me on its own, but it's not clear how the balances with established idioms.
cc @rust-lang/libs @ethanpailes @robinst @killercup @steveklabnik @bluss