-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
IDLE: define word/id chars in one place. #89855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
IDLE currently defines the same set of chars in 5 places with 5 names. (Listed by Serhiy Storchaka in bpo-45669.) I suspect that either a string or frozenset would work everywhere (check). I will pick a name after checking this. The single definition would go in the proposed utils.py, which is part of another issue and PR. (Note: the utility tk index functions should also go there.) |
This set is mostly outdated. In Python 2 it was a set of characters composing identifiers, but in Python 3 identifiers can contain non-ASCII characters. |
Complete sets of characters which can be used in identifiers are too large: >>> allchars = ''.join(map(chr, range(0x110000)))
>>> identstartchars = ''.join(c for c in allchars if c.isidentifier())
>>> identcontchars = ''.join(c for c in allchars if ('a' + c).isidentifier())
>>> len(identstartchars), len(identcontchars)
(131975, 135053) |
This in an interesting problem. I am going to work on it at the next weekends. |
I checked for other possible ascii only problems and only found config_key.py: 14: ALPHANUM_KEYS = tuple(string.ascii_lowercase + string.digits) |
There have been occasional discussions about IDLE not being properly unicode aware in some of its functions. Discussions have foundered on these facts and no fix made.
I would like to better this time. Possible responses to the blockers:
>>> import sys
>>> fz = frozenset(c for c in map(chr, range(0x110000)) if ('a'+c).isidentifier)
>>> sys.getsizeof(fz)
33554648 Whoops, each 2 or 4 byte slice of the underlying array becomes 76 bytes + 8 bytes * size of hash array. Not practical either.
Any other ideas? I will look at the use cases next. |
autoexpand.py, line 20, wordchars 'wordchars' is correct here since words beginning with digits can be expanded. >>> s = '0x4f334'
>>> 0x4f334 # Hit alt-/ after 0 and enter
324404 Used in line 89 in method getprevword while i > 0 and line[i-1] in self.wordchars:
i = i-1 Proposed replacement seems to work. >>> i = len(s)
... while i > 0 and (c := s[i-1] or c == '_'):
... i -= 1
...
>>> i,c
(0, '0') |
autocomplete.py, line 33, ID_CHARS is only used on line 137 to find the prefix of an identifier when completions have been explicitly requested. while i and (curline[i-1] in ID_CHARS or ord(curline[i-1]) > 127):
i -= 1
comp_start = curline[i:j] The completion is for a name or attribute depending on whether the preceding char, if any, is '.'. Here, the unicode fix was to accept all non-ascii as possible id chars. There is no harm as the completion box only has valid completions, and if the prefix given does not match anything, nothing is highlighted. ID_CHARS could be moved to utils if the same ascii string is used in another module. |
undo.py, line 254, alphanumeric Used in immediately following lines to classify chars as 'alphanumeric', 'newline', or 'punctuation' (the default). I believe I have only ever looked at this module to add the test code at the bottom. In any case, I don't know the effect of calling non-ascii chars punctuation, but suspect it is not the best thing. So I suspect that the autoexpand fix would be the best. The classify method is only used on line 248 in the merge method above. To figure out more, I would experiment identifiers without and with non-ascii and undo and redo and see what difference there is. |
editor.py, line 809, IDENTCHARS Used in the immediately following def colorize_syntax_error on line 814. |
hyperparser.py, line 13, _ASCII_ID_CHARS Only used on line 18 and 21 to create 128 item lookup tables. The point is to be fast as hyperparser scans multiple chars when invoked. The expandword fix and c.isidentifier() could replace the lookup. But would they bog down response time? We need to look at hyperparser use cases and do some testing. |
The PR that proposes creating a new utility.py file is mine, linked to #89610. Would it make things easier if I split it into two PRs: one adding an empty util.py file, and the other making my proposed changes to support syntax highlighting for .pyi files? EDIT by tjr: issue closed and util.py added. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: