Remove unreachable #8

hkBst · 2025-05-14T10:27:47Z

This improves the API of this crate to not use unreachable any more and is the continuation of rust-lang/rust#138163.

It also eliminates internal unreachable by inlining Mode methods into the *_common functions, and eliminates the resulting duplication by using traits instead. Using traits is much more verbose than a macro-based variant, because they are very explicit, but hopefully also a bit less unclear.

I've tried to use separate commits to explain the story, but have probably only succeeded at the beginning.

There is a companion PR to use this new API here: rust-lang/rust#140999

r? @nnethercote

GuillaumeGomez · 2025-05-14T15:38:01Z

Gonna take a look later on.

nnethercote · 2025-05-21T02:56:22Z

Thanks for splitting this into multiple pieces. I think it's good that @GuillaumeGomez will look at this, it will be good to get another pair of eyes on it after rust-lang/rust#138163.

GuillaumeGomez · 2025-05-21T15:44:44Z

Changes look good to me. Now comes the not so fun question: can you add benchmarks please?

hkBst · 2025-05-21T15:57:40Z

Changes look good to me. Now comes the not so fun question: can you add benchmarks please?

Are you thinking about numbers (before the crate split off this was looking like this for the macro variant of this change) or code for this crate? I'm happy to come up with some benchmark code.

GuillaumeGomez · 2025-05-21T16:00:26Z

Mostly code, I can check locally when done. Considering it'll impact performance, better check ahead of time. We just need to check the entry functions.

hkBst · 2025-05-27T10:41:32Z

Benchmarks PR: #9

GuillaumeGomez · 2025-05-27T19:13:22Z

src/lib.rs

+/// Takes the contents of a literal (without quotes)
+/// and produces a sequence of errors,
+/// which are returned by invoking `error_callback`.
+pub fn unescape_for_errors(


What is the purpose of this function? It does the conversion but doesn't actually make use of it. It only provides information if one error occurred. Where is it meant to be used?

Its only purpose is to be used in the companion PR using the new API here: https://github.com/rust-lang/rust/pull/140999/files#diff-36d0ff95049fa1b66bdd47ec2c03e1588268303571a9561d1ba664ca29034dacR1019-R1049.

It seemed like a good compromise to remove a stubborn use of unescape_{unicode,mixed} and signals intent well. I suppose it could alternatively live where it is used instead of here, except for its use of unescape_single...

Oh I see, it only checks if there is an error. It doesn't need the unescaped content. Then let me make a suggestion for its documentation.

Right, it is like the other unescape_* functions, but it only gives you the error results and not the Oks. That's why I named it unescape_for_errors. Was the name not a good indication of this behavior?

Not really. Always open to interpretation. I needed your comment to understand why this function was working this way. Documentation is here to clarify that.

GuillaumeGomez · 2025-05-27T19:30:42Z

Also need to update the benchmarks.

The old API exposes `unreachable` in both unescape_unicode and unescape_mixed. These are conceptually one function, but because their return types are incompatible, they could not be unified. The new API takes this insight further to separate unescape_unicode into separate functions, such that byte functions can return bytes instead of chars.

…remove unused Mode methods

GuillaumeGomez · 2025-05-28T13:48:15Z

src/lib.rs

+/// Takes the contents of a literal (without quotes)
+/// and produces a sequence of errors,
+/// which are returned by invoking `error_callback`.


Suggested change

/// Takes the contents of a literal (without quotes)

/// and produces a sequence of errors,

/// which are returned by invoking `error_callback`.

/// Takes the contents of a literal (without quotes) and calls `error_callback` if any error is encountered

/// while unescaping it. Please note that the unescaped content is not provided, this function is only meant

/// to be used to confirm whether or not the literal content is (in)valid.

I've renamed this to check_for_errors and improved its docs. Also took the chance to polish up some of the other doc comments. Let me know what you think.

GuillaumeGomez · 2025-05-29T14:30:59Z

API changes look good to me. Let me check benches now and then I think we're ready to go. :)

GuillaumeGomez · 2025-05-29T21:29:43Z

Here are the bench results:

test name	main branch	new changes	diff
bench_check_raw_byte_str	25,941.68 ns/iter (+/- 57.51)	48,097.89 ns/iter (+/- 104.59)	+85%
bench_check_raw_c_str_ascii	24,387.77 ns/iter (+/- 83.40)	39,709.25 ns/iter (+/- 116.80)	+62%
bench_check_raw_c_str_unicode	40,485.60 ns/iter (+/- 181.36)	50,116.99 ns/iter (+/- 115.94)	+23.7%
bench_check_raw_str_ascii	25,679.40 ns/iter (+/- 68.78)	39,727.86 ns/iter (+/- 137.46)	+54.7%
bench_check_raw_str_unicode	42,031.95 ns/iter (+/- 199.86)	47,429.04 ns/iter (+/- 121.53)	+12.8%
bench_skip_ascii_whitespace	6,321.42 ns/iter (+/- 70.54)	6,322.61 ns/iter (+/- 20.25)	0%
bench_unescape_byte_str_ascii	64,825.30 ns/iter (+/- 68.90)	57,407.75 ns/iter (+/- 272.16)	-11.4%
bench_unescape_byte_str_hex	95,602.13 ns/iter (+/- 214.39)	85,577.19 ns/iter (+/- 188.78)	-10.4%
bench_unescape_byte_str_trivial	34,165.35 ns/iter (+/- 114.53)	52,296.49 ns/iter (+/- 260.71)	+53%
bench_unescape_c_str_ascii	54,627.06 ns/iter (+/- 76.32)	48,477.48 ns/iter (+/- 214.53)	-10.7%
bench_unescape_c_str_hex_ascii	94,049.71 ns/iter (+/- 192.47)	86,601.92 ns/iter (+/- 226.26)	-7.9%
bench_unescape_c_str_hex_byte	94,030.54 ns/iter (+/- 170.87)	86,285.94 ns/iter (+/- 129.85)	-7.9%
bench_unescape_c_str_trivial	44,069.69 ns/iter (+/- 63.67)	37,109.07 ns/iter (+/- 89.56)	-15.8%
bench_unescape_c_str_unicode	183,698.16 ns/iter (+/- 236.18)	165,312.95 ns/iter (+/- 167.56)	-10%
bench_unescape_str_ascii	64,803.76 ns/iter (+/- 88.18)	66,926.66 ns/iter (+/- 228.30)	+3.3%
bench_unescape_str_hex	95,642.76 ns/iter (+/- 145.34)	102,020.68 ns/iter (+/- 140.15)	+6.7%
bench_unescape_str_trivial	34,071.43 ns/iter (+/- 54.39)	46,578.22 ns/iter (+/- 91.29)	+36.7%
bench_unescape_str_unicode	185,521.24 ns/iter (+/- 228.33)	180,438.70 ns/iter (+/- 428.62)	-2.7%

Overall, the check* functions are much slower. The unescape* ones are mixed but with some small gains and big regressions. More work is required to improve everything, we cannot merge as is.

hkBst · 2025-05-30T07:43:33Z

Interesting! Is there an easy way to create such a nice table?

GuillaumeGomez · 2025-05-30T07:54:39Z

Sadly no. I ran benches a lot of time in both main and in your branch and then kept the lowest +/- changes and finally computed the diff for all of them. So I recommend you do the same for main and then you can check with your branch.

hkBst · 2025-05-30T13:02:18Z

Taking the worst offender (bench_check_raw_byte_str), if I manually inline bench_check_raw (and so get rid of &mut dyn FnMut), then a lot (but not all) of the slowdown disappears. Unfortunately just annotating with #[inline(always)] does nothing.

On the other hand, if I make the main branch use the newer more generic bench_check_raw (which includes adding + ?Sized bounds to unescape_unicode and *_common), then it becomes just as slow as the new code.

Maybe I should rewrite the benchmarks as macros, to minimize such issues...

diff --git a/benches/benches.rs b/benches/benches.rs
index a028dfd..1100832 100644
--- a/benches/benches.rs
+++ b/benches/benches.rs
@@ -3,7 +3,9 @@
 extern crate test;
 
 use rustc_literal_escaper::*;
+use std::fmt::Debug;
 use std::iter::repeat_n;
+use std::ops::Range;
 
 const LEN: usize = 10_000;
 
@@ -37,6 +39,24 @@ fn bench_skip_ascii_whitespace(b: &mut test::Bencher) {
 // Check raw
 //
 
+#[allow(clippy::type_complexity)]
+fn new_bench_check_raw<UNIT: Into<char> + PartialEq + Debug + Copy>(
+    b: &mut test::Bencher,
+    c: UNIT,
+    check_raw: fn(&str, &mut dyn FnMut(Range<usize>, Result<UNIT, EscapeError>)),
+) {
+    let input: String = test::black_box(repeat_n(c.into(), LEN).collect());
+    assert_eq!(input.len(), LEN * c.into().len_utf8());
+
+    b.iter(|| {
+        let mut output = vec![];
+
+        check_raw(&input, &mut |range, res| output.push((range, res)));
+        assert_eq!(output.len(), LEN);
+        assert_eq!(output[0], (0..c.into().len_utf8(), Ok(c)));
+    });
+}
+
 fn bench_check_raw(b: &mut test::Bencher, c: char, mode: Mode) {
     let input: String = test::black_box(repeat_n(c, LEN).collect());
     assert_eq!(input.len(), LEN * c.len_utf8());
@@ -64,7 +84,20 @@ fn bench_check_raw_str_unicode(b: &mut test::Bencher) {
 
 #[bench]
 fn bench_check_raw_byte_str(b: &mut test::Bencher) {
-    bench_check_raw(b, 'a', Mode::RawByteStr);
+    //    bench_check_raw(b, 'a', Mode::RawByteStr);
+
+    new_bench_check_raw(b, 'a', |s, cb| unescape_unicode(s, Mode::RawByteStr, cb));
+
+    // let input: String = test::black_box(repeat_n('a', LEN).collect());
+    // assert_eq!(input.len(), LEN * 'a'.len_utf8());
+
+    // b.iter(|| {
+    //     let mut output = vec![];
+
+    //     check_raw_byte_str(&input, &mut |range, res| output.push((range, res)));
+    //     assert_eq!(output.len(), LEN);
+    //     assert_eq!(output[0], (0..1, Ok(b'a')));
+    // });
 }
 
 // raw C str
diff --git a/src/lib.rs b/src/lib.rs
index d315ed2..c381032 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -87,7 +87,7 @@ impl EscapeError {
 /// the callback will be called exactly once.
 pub fn unescape_unicode<F>(src: &str, mode: Mode, callback: &mut F)
 where
-    F: FnMut(Range<usize>, Result<char, EscapeError>),
+    F: FnMut(Range<usize>, Result<char, EscapeError>) + ?Sized,
 {
     match mode {
         Char | Byte => {
@@ -357,7 +357,7 @@ fn unescape_char_or_byte(chars: &mut Chars<'_>, mode: Mode) -> Result<char, Esca
 /// sequence of escaped characters or errors.
 fn unescape_non_raw_common<F, T: From<char> + From<u8>>(src: &str, mode: Mode, callback: &mut F)
 where
-    F: FnMut(Range<usize>, Result<T, EscapeError>),
+    F: FnMut(Range<usize>, Result<T, EscapeError>) + ?Sized,
 {
     let mut chars = src.chars();
     let allow_unicode_chars = mode.allow_unicode_chars(); // get this outside the loop
@@ -424,7 +424,7 @@ where
 /// only produce errors on bare CR.
 fn check_raw_common<F>(src: &str, mode: Mode, callback: &mut F)
 where
-    F: FnMut(Range<usize>, Result<char, EscapeError>),
+    F: FnMut(Range<usize>, Result<char, EscapeError>) + ?Sized,
 {
     let mut chars = src.chars();
     let allow_unicode_chars = mode.allow_unicode_chars(); // get this outside the loop

hkBst mentioned this pull request May 14, 2025

literal-escaper v0.0.2 => v0.0.3 for better API without unreachable rust-lang/rust#140999

Draft

rust-cloud-vms bot force-pushed the remove_unreachable branch from f816e0b to 00b6cfd Compare May 14, 2025 11:23

nnethercote assigned nnethercote and GuillaumeGomez and unassigned nnethercote May 21, 2025

GuillaumeGomez reviewed May 27, 2025

View reviewed changes

hkBst added 5 commits May 28, 2025 09:23

inline unescape_{unicode,mixed} and move docs

9fdada0

replace check_raw_common with trait

3babd8e

replace unescape_{char,byte} and check_non_raw_common with trait and …

14fb77e

…remove unused Mode methods

do not use Mode::* and move stuff around for better organisation

45a5bf4

rust-cloud-vms bot force-pushed the remove_unreachable branch from 00b6cfd to 45a5bf4 Compare May 28, 2025 09:27

GuillaumeGomez reviewed May 28, 2025

View reviewed changes

rename unescape_for_errors -> check_for_errors, and improve docs

6faef8e

Remove unreachable #8

Are you sure you want to change the base?

Remove unreachable #8

Uh oh!

Conversation

hkBst commented May 14, 2025

Uh oh!

GuillaumeGomez commented May 14, 2025

Uh oh!

nnethercote commented May 21, 2025

Uh oh!

GuillaumeGomez commented May 21, 2025

Uh oh!

hkBst commented May 21, 2025

Uh oh!

GuillaumeGomez commented May 21, 2025

Uh oh!

hkBst commented May 27, 2025

Uh oh!

GuillaumeGomez May 27, 2025

Choose a reason for hiding this comment

Uh oh!

hkBst May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GuillaumeGomez May 28, 2025

Choose a reason for hiding this comment

Uh oh!

hkBst May 28, 2025

Choose a reason for hiding this comment

Uh oh!

GuillaumeGomez May 28, 2025

Choose a reason for hiding this comment

Uh oh!

GuillaumeGomez commented May 27, 2025

Uh oh!

GuillaumeGomez May 28, 2025

Choose a reason for hiding this comment

Uh oh!

hkBst May 29, 2025

Choose a reason for hiding this comment

Uh oh!

GuillaumeGomez commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GuillaumeGomez commented May 29, 2025

Uh oh!

hkBst commented May 30, 2025

Uh oh!

GuillaumeGomez commented May 30, 2025

Uh oh!

hkBst commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

hkBst May 28, 2025 •

edited

Loading

GuillaumeGomez commented May 29, 2025 •

edited

Loading

hkBst commented May 30, 2025 •

edited

Loading