-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[WIP] bytes/str/unicode #2203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] bytes/str/unicode #2203
Conversation
This has no effect as long as bytes remains an alias for str, but it enables experiments with differentiating them for better handling of bytes/str/union. Depends on python/typed_ast#17.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a running commentary on the changes I had to make to make things work, including some open issues.
Note that we're not even touching NativeStr -- we're just introducing a separation between str and bytes.
@@ -219,6 +219,7 @@ def translate_module_id(self, id: str) -> str: | |||
if id == self.custom_typing_module: | |||
return 'typing' | |||
elif id == '__builtin__' and self.pyversion[0] == 2: | |||
assert False # Shouldn't get here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These asserts are just for my peace of mind -- I think when @Michael0x2a forked fastparse.py into fastparse2.py, he left some version-checking code in that could be reduced because we know it's always Python 2. (For stubs and type comments, the Python 3 syntax is always parsed with fastparse.py.)
@@ -236,7 +236,8 @@ def translate_module_id(self, id: str) -> str: | |||
""" | |||
if id == self.custom_typing_module: | |||
return 'typing' | |||
elif id == '__builtin__' and self.pyversion[0] == 2: | |||
elif id == '__builtin__': | |||
assert self.pyversion[0] == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same story as for fastparse2.py.
return BytesExpr(contents) | ||
else: | ||
return StrExpr(contents) | ||
if s.has_b: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe(*) this block is the full extent of the changes needed in mypy to differentiate between b''
and ''
-- all the heavy lifting is done by python/typed_ast#17 (thanks @ilevkivskyi !).
And if typeshed defines bytes as an alias for str the treatment of StrExpr and BytesExpr is identical -- however the intent is for typeshed to make bytes and str separate classes, and then it's important that the inferred type of b''
is bytes while that of ''
is str.
(*) Well there may be a few more places where a hardcoded check for builtins.str is used that will have to also allow builtins.bytes, but in many places builtins.str is actually still the only type to be treated so. E.g. dict(x=1)
always has str keys, never bytes keys.
# u'x' -> UnicodeExpr | ||
# BytesExpr is unused | ||
# However after `from __future__ import unicode_literals` [also new!]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, with unicode_literals
mypy would assume b''
was a unicode literal too! (Incorrectly, of course.)
@@ -62,6 +62,7 @@ def test_python_evaluation(testcase): | |||
interpreter = python3_path | |||
args = [] | |||
py2 = False | |||
args.append('--fast-parser') # Some tests require this now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a problem, because it means that these tests can't be run on Windows any more (until typed_ast is finally ported there -- I think @ddfisher mentioned he was making progress so hopefully I can stop worrying about this). The checker tests (testcheck.py) allow specifying this flag per testcase, but these eval tests don't have that feature.
@@ -396,11 +403,12 @@ def f(x: unicode) -> int: pass | |||
def f(x: bytearray) -> int: pass | |||
[out] | |||
_program.py:2: error: No overload variant of "f" matches argument types [builtins.int] | |||
_program.py:5: error: No overload variant of "f" matches argument types [builtins.bytes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there's no overload for bytes, and there's no equivalency between bytes and bytearray. Not sure I like this.
|
||
[case testByteArrayStrCompatibility_python2] | ||
def f(x): # type: (str) -> None | ||
def f(x): # type: (bytes) -> None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that bytearray() is compatible with bytes, but not with str, hence the change here.
pass | ||
f(bytearray('foo')) | ||
f(bytearray(b'foo')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument to the bytearray() constructor must be bytes. That's reasonable.
@@ -1,4 +1,4 @@ | |||
flake8 | |||
typed-ast | |||
typed-ast>=0.6.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version has the 'has_b' flag for Str objects.
@@ -1 +1 @@ | |||
Subproject commit aa549db5e5e57ee2702899d1cc660163b52171ed | |||
Subproject commit 455f8aa834ee7fc6b1527cce0f838d4829a4e1d3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulling in the latest strbyt branck from typeshed (python/typeshed#580).
I've got a feeling we're not going to make progress with this topic. I'm also not sure that it's important (maybe I'll pick it up again once I start looking at some big project to port some Python 2 code to Python 3, but that looks a ways off). So I'm closing this, to keep our list of open PRs short. |
python/typing#208