Skip to content

Overhaul char_lit() #36485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 17, 2016
Merged

Overhaul char_lit() #36485

merged 1 commit into from
Sep 17, 2016

Conversation

nnethercote
Copy link
Contributor

This commit does the following.

  • Removes parsing support for '\X12', '\u123456' and '\U12345678' char
    literals. These are no longer valid Rust and rejected by the lexer.
    (This strange-sounding situation occurs because the parser rescans
    char literals to compute their value.)
  • Rearranges the function so that all the escaped values are handled in
    a single match. The error-handling strategy is based on the one used
    by byte_lit().

}

let unicode_escape = || -> Option<(char, isize)> {
if lit.as_bytes()[2] == b'{' {
match lit.as_bytes()[1] as char {
Copy link
Member

@nagisa nagisa Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I find this code a bit harder to follow than the original one, because handling of of non-escapes and escapes have been split up. Previously it was pretty obvious from (Some('\\'), Some(c)) that something of the \_ form is being handled in that branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. The old code had two matches, a local function, and a local closure, with the handling of the non-escaped case combined with the handling of some (but not all) of the escaped cases in one of the matches. I found it very messy and I had to come back to it multiple times to understand it fully. (Indeed, I had a first go at improving this function in #36414.)

In contrast, the new code handles the common non-escaped case first, and then handles all the escaped cases in a single match, with no need for local functions or closures.

BTW, do these comments constitute a proper review, or are they just "drive-by" comments? Will the commit receive attention from anybody else? It's not clear to me what I need to do now for this PR to receive r+.

Copy link
Member

@nagisa nagisa Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do these comments constitute a proper review, or are they just "drive-by" comments? Will the commit receive attention from anybody else?

They do constitute a proper review. Usually a bot would have commented and assigned a reviewer for you, but seems like it has bugged out for some reason.

It's not clear to me what I need to do now for this PR to receive r+.

Now that I read it again, it sort of makes sense to handle no-escape case outside of escape case.

Perhaps something like following could be better:

if first_byte == b'\\' { 
    /* handle escape */ 
} else { 
    /* handle non-escape case */ 
}

so the escape handling is still tied to the \\ that is supposed to go before the escape sequence, rather than to the no-escape case like it is now.

Once you adjust the PR for error handling and making \\ thing more tied to parsing of escape sequences (in any way), I feel this PR would be in a great shape to r+.

u32::from_str_radix(&lit[2..len], 16).ok()
.and_then(char::from_u32)
.map(|x| (x, len as isize))
let err = |i| format!("lexer accepted invalid char literal {} step {}", lit, i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason to do this as opposed to plain unwrap/expect where necessary, its just noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this more verbose way to match existing functions like byte_lit. I agree that it's unnecessary and I'd be happy to change it to use vanilla assert! and unwrap.

This commit does the following.

- Removes parsing support for '\X12', '\u123456' and '\U12345678' char
  literals. These are no longer valid Rust and rejected by the lexer.
  (This strange-sounding situation occurs because the parser rescans
  char literals to compute their value.)

- Rearranges the function so that all the escaped values are handled in
  a single `match`, and changes the error-handling to use vanilla
  assert!() and unwrap().
@nnethercote
Copy link
Contributor Author

@nagisa: I changed the error-handling to use vanilla assert! and unwrap.

I didn't change the structure w.r.t. the '' handling, though I did add a couple of brief comments. I left it alone because I think dealing with the common and simple case (i.e. an unescaped char) first is good, and I didn't want to introduce unnecessary rightward drift for the escaped case. If you still think this form is bad, I will change it again.

@sfackler
Copy link
Member

@bors r=nagisa

@bors
Copy link
Collaborator

bors commented Sep 15, 2016

📌 Commit 63ded05 has been approved by nagisa

@bors
Copy link
Collaborator

bors commented Sep 17, 2016

⌛ Testing commit 63ded05 with merge cde61ba...

@bors bors merged commit 63ded05 into rust-lang:master Sep 17, 2016
@nnethercote nnethercote deleted the char_lit-2 branch October 7, 2016 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants