-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Overhaul char_lit() #36485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul char_lit() #36485
Conversation
} | ||
|
||
let unicode_escape = || -> Option<(char, isize)> { | ||
if lit.as_bytes()[2] == b'{' { | ||
match lit.as_bytes()[1] as char { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I find this code a bit harder to follow than the original one, because handling of of non-escapes and escapes have been split up. Previously it was pretty obvious from (Some('\\'), Some(c))
that something of the \_
form is being handled in that branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting. The old code had two match
es, a local function, and a local closure, with the handling of the non-escaped case combined with the handling of some (but not all) of the escaped cases in one of the match
es. I found it very messy and I had to come back to it multiple times to understand it fully. (Indeed, I had a first go at improving this function in #36414.)
In contrast, the new code handles the common non-escaped case first, and then handles all the escaped cases in a single match
, with no need for local functions or closures.
BTW, do these comments constitute a proper review, or are they just "drive-by" comments? Will the commit receive attention from anybody else? It's not clear to me what I need to do now for this PR to receive r+.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, do these comments constitute a proper review, or are they just "drive-by" comments? Will the commit receive attention from anybody else?
They do constitute a proper review. Usually a bot would have commented and assigned a reviewer for you, but seems like it has bugged out for some reason.
It's not clear to me what I need to do now for this PR to receive r+.
Now that I read it again, it sort of makes sense to handle no-escape case outside of escape case.
Perhaps something like following could be better:
if first_byte == b'\\' {
/* handle escape */
} else {
/* handle non-escape case */
}
so the escape handling is still tied to the \\
that is supposed to go before the escape sequence, rather than to the no-escape case like it is now.
Once you adjust the PR for error handling and making \\
thing more tied to parsing of escape sequences (in any way), I feel this PR would be in a great shape to r+.
u32::from_str_radix(&lit[2..len], 16).ok() | ||
.and_then(char::from_u32) | ||
.map(|x| (x, len as isize)) | ||
let err = |i| format!("lexer accepted invalid char literal {} step {}", lit, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no reason to do this as opposed to plain unwrap
/expect
where necessary, its just noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it this more verbose way to match existing functions like byte_lit. I agree that it's unnecessary and I'd be happy to change it to use vanilla assert!
and unwrap
.
This commit does the following. - Removes parsing support for '\X12', '\u123456' and '\U12345678' char literals. These are no longer valid Rust and rejected by the lexer. (This strange-sounding situation occurs because the parser rescans char literals to compute their value.) - Rearranges the function so that all the escaped values are handled in a single `match`, and changes the error-handling to use vanilla assert!() and unwrap().
607d591
to
63ded05
Compare
@nagisa: I changed the error-handling to use vanilla I didn't change the structure w.r.t. the '' handling, though I did add a couple of brief comments. I left it alone because I think dealing with the common and simple case (i.e. an unescaped char) first is good, and I didn't want to introduce unnecessary rightward drift for the escaped case. If you still think this form is bad, I will change it again. |
@bors r=nagisa |
📌 Commit 63ded05 has been approved by |
This commit does the following.
literals. These are no longer valid Rust and rejected by the lexer.
(This strange-sounding situation occurs because the parser rescans
char literals to compute their value.)
a single
match
. The error-handling strategy is based on the one usedby byte_lit().