-
Notifications
You must be signed in to change notification settings - Fork 599
S_scan_ident: Comments, refactor some; fix bugs #23866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
khwilliamson
wants to merge
17
commits into
Perl:blead
Choose a base branch
from
khwilliamson:scan_ident
base: blead
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+219
−103
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The soft hypen is treated specially in toke.c
This is in preparation for passing other options to this function
This function is complicated, without enough documentation for me to understand the subtleties; I only studied it enough to change things I needed to, or which became obvious to me in the process. Other things remain undocumented by this commit. Some of the white space gives improper indentation which will fit a future commit. This commit also remove redundant parentheses in one statement
This makes things clearer.
That this had to be true was not obvious to me without studying closely the code before it. Adding an assertion will result in others deciding they don't have to figure it out.
It's clearer to handle the short case first, and put the much longer case afterwards.
It's clearer to handle the short case first, and put the much longer case afterwards.
These were declared far above, due to C89 that is no longer a constraint.
This check that the code just below won't look beyond the end of the buffer, is rendered redundant by the "_safe" macro which does the check itself.
By setting a variable in advance, we can merge two loops into one.
Save the value from the first time into a variable
I don't know what I was thinking when I recently thought these needed to be in a different order. The conjuctions are all &&, so might as well do the simpler things first
S_scan_ident would like to call this function, already having looked at the first character of an identifier, and deciding it is legal. It wants this function to finish the scan. This commit adds a flag to S_parse_ident to accommodate this.
This fixes a bug in this function, in which it required the second character in an identifier to be IDStart, instead of IDCont. This hasn't been caught because most identifiers are ASCII, and generally for the purposes of this function in the ASCII range, all \w characters can be IDStart.
There is a bug here in which this function is called from S_intuit_more just to see if there is an identifier in the string it is looking at. But that call can have "subtle implications on parsing" (according to the long-standing comments in it). We need a way to call scan_ident without side-effects. This commit adds that capability. The next will use it.
This fixes the bug that examining the parse buffer had side-effects. I don't know what the implications of that were.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This set of commits targets S_scan_ident in toke.c.
I've added a bunch of comments, and done minor refactoring and simplification, pushing all the real parsing into parse_ident(), which this calls in several places. That makes it easier to maintain going forward.
I've noticed for some time what appeared to be a bug, and it turned out I was right; fixed with a test. It could not handle a UTF-8 identifier with the second character not being one that is suitable for being a first character.
The second bug is more subtle, also noticed from code reading. S_intuit_more() calls S_scan_ident() just to see if the string it is looking at might possibly be an identifier. The problem is that scan_ident didn't think anyone would call it without actually expecting it to take various actions. I've added a check-only mode, so that you can call scan_ident without side effects. I do not know what the potential harms were from those side effects.