Overhaul char_lit() #36485

nnethercote · 2016-09-15T03:24:03Z

This commit does the following.

Removes parsing support for '\X12', '\u123456' and '\U12345678' char
literals. These are no longer valid Rust and rejected by the lexer.
(This strange-sounding situation occurs because the parser rescans
char literals to compute their value.)
Rearranges the function so that all the escaped values are handled in
a single match. The error-handling strategy is based on the one used
by byte_lit().

nagisa · 2016-09-15T13:10:19Z

src/libsyntax/parse/mod.rs

    }

-    let unicode_escape = || -> Option<(char, isize)> {
-        if lit.as_bytes()[2] == b'{' {
+    match lit.as_bytes()[1] as char {


In general I find this code a bit harder to follow than the original one, because handling of of non-escapes and escapes have been split up. Previously it was pretty obvious from (Some('\\'), Some(c)) that something of the \_ form is being handled in that branch.

That's interesting. The old code had two matches, a local function, and a local closure, with the handling of the non-escaped case combined with the handling of some (but not all) of the escaped cases in one of the matches. I found it very messy and I had to come back to it multiple times to understand it fully. (Indeed, I had a first go at improving this function in #36414.)

In contrast, the new code handles the common non-escaped case first, and then handles all the escaped cases in a single match, with no need for local functions or closures.

BTW, do these comments constitute a proper review, or are they just "drive-by" comments? Will the commit receive attention from anybody else? It's not clear to me what I need to do now for this PR to receive r+.

BTW, do these comments constitute a proper review, or are they just "drive-by" comments? Will the commit receive attention from anybody else?

They do constitute a proper review. Usually a bot would have commented and assigned a reviewer for you, but seems like it has bugged out for some reason.

It's not clear to me what I need to do now for this PR to receive r+.

Now that I read it again, it sort of makes sense to handle no-escape case outside of escape case.

Perhaps something like following could be better:

if first_byte == b'\\' { /* handle escape */ } else { /* handle non-escape case */ }

so the escape handling is still tied to the \\ that is supposed to go before the escape sequence, rather than to the no-escape case like it is now.

Once you adjust the PR for error handling and making \\ thing more tied to parsing of escape sequences (in any way), I feel this PR would be in a great shape to r+.

nagisa · 2016-09-15T16:20:32Z

src/libsyntax/parse/mod.rs

-        u32::from_str_radix(&lit[2..len], 16).ok()
-        .and_then(char::from_u32)
-        .map(|x| (x, len as isize))
+    let err = |i| format!("lexer accepted invalid char literal {} step {}", lit, i);


I see no reason to do this as opposed to plain unwrap/expect where necessary, its just noise.

I did it this more verbose way to match existing functions like byte_lit. I agree that it's unnecessary and I'd be happy to change it to use vanilla assert! and unwrap.

This commit does the following. - Removes parsing support for '\X12', '\u123456' and '\U12345678' char literals. These are no longer valid Rust and rejected by the lexer. (This strange-sounding situation occurs because the parser rescans char literals to compute their value.) - Rearranges the function so that all the escaped values are handled in a single `match`, and changes the error-handling to use vanilla assert!() and unwrap().

nnethercote · 2016-09-15T23:18:47Z

@nagisa: I changed the error-handling to use vanilla assert! and unwrap.

I didn't change the structure w.r.t. the '' handling, though I did add a couple of brief comments. I left it alone because I think dealing with the common and simple case (i.e. an unescaped char) first is good, and I didn't want to introduce unnecessary rightward drift for the escaped case. If you still think this form is bad, I will change it again.

sfackler · 2016-09-15T23:25:28Z

@bors r=nagisa

bors · 2016-09-15T23:25:30Z

📌 Commit 63ded05 has been approved by nagisa

bors · 2016-09-17T09:51:14Z

⌛ Testing commit 63ded05 with merge cde61ba...

Overhaul char_lit() This commit does the following. - Removes parsing support for '\X12', '\u123456' and '\U12345678' char literals. These are no longer valid Rust and rejected by the lexer. (This strange-sounding situation occurs because the parser rescans char literals to compute their value.) - Rearranges the function so that all the escaped values are handled in a single `match`. The error-handling strategy is based on the one used by byte_lit().

bors · 2016-09-17T13:17:53Z

nagisa reviewed Sep 15, 2016

View reviewed changes

nnethercote force-pushed the char_lit-2 branch from 607d591 to 63ded05 Compare September 15, 2016 23:14

bors merged commit 63ded05 into rust-lang:master Sep 17, 2016

nnethercote deleted the char_lit-2 branch October 7, 2016 05:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Overhaul char_lit() #36485

Overhaul char_lit() #36485

Uh oh!

nnethercote commented Sep 15, 2016

Uh oh!

nagisa Sep 15, 2016 •

edited

Loading

Uh oh!

nnethercote Sep 15, 2016

Uh oh!

nagisa Sep 15, 2016 •

edited

Loading

Uh oh!

nagisa Sep 15, 2016

Uh oh!

nnethercote Sep 15, 2016

Uh oh!

nnethercote commented Sep 15, 2016

Uh oh!

sfackler commented Sep 15, 2016

Uh oh!

bors commented Sep 15, 2016

Uh oh!

bors commented Sep 17, 2016

Uh oh!

bors commented Sep 17, 2016

Uh oh!

Uh oh!

Overhaul char_lit() #36485

Overhaul char_lit() #36485

Uh oh!

Conversation

nnethercote commented Sep 15, 2016

Uh oh!

nagisa Sep 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nnethercote Sep 15, 2016

Choose a reason for hiding this comment

Uh oh!

nagisa Sep 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nagisa Sep 15, 2016

Choose a reason for hiding this comment

Uh oh!

nnethercote Sep 15, 2016

Choose a reason for hiding this comment

Uh oh!

nnethercote commented Sep 15, 2016

Uh oh!

sfackler commented Sep 15, 2016

Uh oh!

bors commented Sep 15, 2016

Uh oh!

bors commented Sep 17, 2016

Uh oh!

bors commented Sep 17, 2016

Uh oh!

Uh oh!

nagisa Sep 15, 2016 •

edited

Loading

nagisa Sep 15, 2016 •

edited

Loading