-
Notifications
You must be signed in to change notification settings - Fork 2
Remove unreachable #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f816e0b
to
00b6cfd
Compare
Gonna take a look later on. |
Thanks for splitting this into multiple pieces. I think it's good that @GuillaumeGomez will look at this, it will be good to get another pair of eyes on it after rust-lang/rust#138163. |
Changes look good to me. Now comes the not so fun question: can you add benchmarks please? |
Are you thinking about numbers (before the crate split off this was looking like this for the macro variant of this change) or code for this crate? I'm happy to come up with some benchmark code. |
Mostly code, I can check locally when done. Considering it'll impact performance, better check ahead of time. We just need to check the entry functions. |
Benchmarks PR: #9 |
src/lib.rs
Outdated
/// Takes the contents of a literal (without quotes) | ||
/// and produces a sequence of errors, | ||
/// which are returned by invoking `error_callback`. | ||
pub fn unescape_for_errors( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this function? It does the conversion but doesn't actually make use of it. It only provides information if one error occurred. Where is it meant to be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its only purpose is to be used in the companion PR using the new API here: https://github.com/rust-lang/rust/pull/140999/files#diff-36d0ff95049fa1b66bdd47ec2c03e1588268303571a9561d1ba664ca29034dacR1019-R1049.
It seemed like a good compromise to remove a stubborn use of unescape_{unicode,mixed} and signals intent well. I suppose it could alternatively live where it is used instead of here, except for its use of unescape_single
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, it only checks if there is an error. It doesn't need the unescaped content. Then let me make a suggestion for its documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it is like the other unescape_*
functions, but it only gives you the error results and not the Ok
s. That's why I named it unescape_for_errors
. Was the name not a good indication of this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. Always open to interpretation. I needed your comment to understand why this function was working this way. Documentation is here to clarify that.
Also need to update the benchmarks. |
The old API exposes `unreachable` in both unescape_unicode and unescape_mixed. These are conceptually one function, but because their return types are incompatible, they could not be unified. The new API takes this insight further to separate unescape_unicode into separate functions, such that byte functions can return bytes instead of chars.
…remove unused Mode methods
00b6cfd
to
45a5bf4
Compare
src/lib.rs
Outdated
/// Takes the contents of a literal (without quotes) | ||
/// and produces a sequence of errors, | ||
/// which are returned by invoking `error_callback`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Takes the contents of a literal (without quotes) | |
/// and produces a sequence of errors, | |
/// which are returned by invoking `error_callback`. | |
/// Takes the contents of a literal (without quotes) and calls `error_callback` if any error is encountered | |
/// while unescaping it. Please note that the unescaped content is not provided, this function is only meant | |
/// to be used to confirm whether or not the literal content is (in)valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed this to check_for_errors
and improved its docs. Also took the chance to polish up some of the other doc comments. Let me know what you think.
API changes look good to me. Let me check benches now and then I think we're ready to go. :) |
Here are the bench results:
Overall, the |
Interesting! Is there an easy way to create such a nice table? |
Sadly no. I ran benches a lot of time in both |
Taking the worst offender ( On the other hand, if I make the main branch use the newer more generic Maybe I should rewrite the benchmarks as macros, to minimize such issues... diff --git a/benches/benches.rs b/benches/benches.rs
index a028dfd..1100832 100644
--- a/benches/benches.rs
+++ b/benches/benches.rs
@@ -3,7 +3,9 @@
extern crate test;
use rustc_literal_escaper::*;
+use std::fmt::Debug;
use std::iter::repeat_n;
+use std::ops::Range;
const LEN: usize = 10_000;
@@ -37,6 +39,24 @@ fn bench_skip_ascii_whitespace(b: &mut test::Bencher) {
// Check raw
//
+#[allow(clippy::type_complexity)]
+fn new_bench_check_raw<UNIT: Into<char> + PartialEq + Debug + Copy>(
+ b: &mut test::Bencher,
+ c: UNIT,
+ check_raw: fn(&str, &mut dyn FnMut(Range<usize>, Result<UNIT, EscapeError>)),
+) {
+ let input: String = test::black_box(repeat_n(c.into(), LEN).collect());
+ assert_eq!(input.len(), LEN * c.into().len_utf8());
+
+ b.iter(|| {
+ let mut output = vec![];
+
+ check_raw(&input, &mut |range, res| output.push((range, res)));
+ assert_eq!(output.len(), LEN);
+ assert_eq!(output[0], (0..c.into().len_utf8(), Ok(c)));
+ });
+}
+
fn bench_check_raw(b: &mut test::Bencher, c: char, mode: Mode) {
let input: String = test::black_box(repeat_n(c, LEN).collect());
assert_eq!(input.len(), LEN * c.len_utf8());
@@ -64,7 +84,20 @@ fn bench_check_raw_str_unicode(b: &mut test::Bencher) {
#[bench]
fn bench_check_raw_byte_str(b: &mut test::Bencher) {
- bench_check_raw(b, 'a', Mode::RawByteStr);
+ // bench_check_raw(b, 'a', Mode::RawByteStr);
+
+ new_bench_check_raw(b, 'a', |s, cb| unescape_unicode(s, Mode::RawByteStr, cb));
+
+ // let input: String = test::black_box(repeat_n('a', LEN).collect());
+ // assert_eq!(input.len(), LEN * 'a'.len_utf8());
+
+ // b.iter(|| {
+ // let mut output = vec![];
+
+ // check_raw_byte_str(&input, &mut |range, res| output.push((range, res)));
+ // assert_eq!(output.len(), LEN);
+ // assert_eq!(output[0], (0..1, Ok(b'a')));
+ // });
}
// raw C str
diff --git a/src/lib.rs b/src/lib.rs
index d315ed2..c381032 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -87,7 +87,7 @@ impl EscapeError {
/// the callback will be called exactly once.
pub fn unescape_unicode<F>(src: &str, mode: Mode, callback: &mut F)
where
- F: FnMut(Range<usize>, Result<char, EscapeError>),
+ F: FnMut(Range<usize>, Result<char, EscapeError>) + ?Sized,
{
match mode {
Char | Byte => {
@@ -357,7 +357,7 @@ fn unescape_char_or_byte(chars: &mut Chars<'_>, mode: Mode) -> Result<char, Esca
/// sequence of escaped characters or errors.
fn unescape_non_raw_common<F, T: From<char> + From<u8>>(src: &str, mode: Mode, callback: &mut F)
where
- F: FnMut(Range<usize>, Result<T, EscapeError>),
+ F: FnMut(Range<usize>, Result<T, EscapeError>) + ?Sized,
{
let mut chars = src.chars();
let allow_unicode_chars = mode.allow_unicode_chars(); // get this outside the loop
@@ -424,7 +424,7 @@ where
/// only produce errors on bare CR.
fn check_raw_common<F>(src: &str, mode: Mode, callback: &mut F)
where
- F: FnMut(Range<usize>, Result<char, EscapeError>),
+ F: FnMut(Range<usize>, Result<char, EscapeError>) + ?Sized,
{
let mut chars = src.chars();
let allow_unicode_chars = mode.allow_unicode_chars(); // get this outside the loop |
This improves the API of this crate to not use
unreachable
any more and is the continuation of rust-lang/rust#138163.It also eliminates internal
unreachable
by inliningMode
methods into the *_common functions, and eliminates the resulting duplication by using traits instead. Using traits is much more verbose than a macro-based variant, because they are very explicit, but hopefully also a bit less unclear.I've tried to use separate commits to explain the story, but have probably only succeeded at the beginning.
There is a companion PR to use this new API here: rust-lang/rust#140999
r? @nnethercote