Skip to content

Support non-ascii identifiers #586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JukkaL opened this issue Feb 22, 2015 · 5 comments
Closed

Support non-ascii identifiers #586

JukkaL opened this issue Feb 22, 2015 · 5 comments

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented Feb 22, 2015

We should support non-ascii (unicode) identifiers as variable names, etc.

@o11c
Copy link
Contributor

o11c commented Aug 8, 2015

I'm actually not sure how to support this since it is defined in terms of unicode classes, which are not supported by python's regex engine, and there is no efficient way to enumerate the characters in each category (let alone DTRT for the particular version of unicode supported by any given python3 release).

Really seems like more of a misfeature than a feature to me.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Aug 9, 2015

Hmm. I haven't actually looked into this in much detail. We should still be able to implement this in an efficient manner for the 99% of code that doesn't use unicode identifiers. It's fine if using unicode identifiers slows down parsing, but it's not okay if this feature slows down parsing everything else.

Maybe something like this:

Have a separate case in the lexer for all non-ascii characters outside string literals and comments (character code >= 128). Fall back to a possibly slow implementation that can do whatever needed to get this working when we hit this, but skip this code path altogether for code that doesn't use any unicode identifiers.

Also, the unicodedata module could be useful.

@gvanrossum
Copy link
Member

I'm going to close this as a won't fix because the new parser will solve this and we're not going to put any work in the old parser/lexer.

@vedgar
Copy link

vedgar commented Feb 1, 2017

May I ask what's the status of that New Parser? Maybe I can even work on it, if it's Python-only. I was extremely surprised to learn that mypy still doesn't understand Python naming rules. PEP 3131 is almost a decade old. :-o

@gvanrossum
Copy link
Member

gvanrossum commented Feb 1, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants