Skip to content

[WIP] bytes/str/unicode #2203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

[WIP] bytes/str/unicode #2203

wants to merge 8 commits into from

Conversation

gvanrossum
Copy link
Member

Guido van Rossum added 8 commits September 30, 2016 17:47
Copy link
Member Author

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a running commentary on the changes I had to make to make things work, including some open issues.

Note that we're not even touching NativeStr -- we're just introducing a separation between str and bytes.

@@ -219,6 +219,7 @@ def translate_module_id(self, id: str) -> str:
if id == self.custom_typing_module:
return 'typing'
elif id == '__builtin__' and self.pyversion[0] == 2:
assert False # Shouldn't get here
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These asserts are just for my peace of mind -- I think when @Michael0x2a forked fastparse.py into fastparse2.py, he left some version-checking code in that could be reduced because we know it's always Python 2. (For stubs and type comments, the Python 3 syntax is always parsed with fastparse.py.)

@@ -236,7 +236,8 @@ def translate_module_id(self, id: str) -> str:
"""
if id == self.custom_typing_module:
return 'typing'
elif id == '__builtin__' and self.pyversion[0] == 2:
elif id == '__builtin__':
assert self.pyversion[0] == 2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same story as for fastparse2.py.

return BytesExpr(contents)
else:
return StrExpr(contents)
if s.has_b:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe(*) this block is the full extent of the changes needed in mypy to differentiate between b'' and '' -- all the heavy lifting is done by python/typed_ast#17 (thanks @ilevkivskyi !).

And if typeshed defines bytes as an alias for str the treatment of StrExpr and BytesExpr is identical -- however the intent is for typeshed to make bytes and str separate classes, and then it's important that the inferred type of b'' is bytes while that of '' is str.

(*) Well there may be a few more places where a hardcoded check for builtins.str is used that will have to also allow builtins.bytes, but in many places builtins.str is actually still the only type to be treated so. E.g. dict(x=1) always has str keys, never bytes keys.

# u'x' -> UnicodeExpr
# BytesExpr is unused
# However after `from __future__ import unicode_literals` [also new!]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, with unicode_literals mypy would assume b'' was a unicode literal too! (Incorrectly, of course.)

@@ -62,6 +62,7 @@ def test_python_evaluation(testcase):
interpreter = python3_path
args = []
py2 = False
args.append('--fast-parser') # Some tests require this now.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a problem, because it means that these tests can't be run on Windows any more (until typed_ast is finally ported there -- I think @ddfisher mentioned he was making progress so hopefully I can stop worrying about this). The checker tests (testcheck.py) allow specifying this flag per testcase, but these eval tests don't have that feature.

@@ -396,11 +403,12 @@ def f(x: unicode) -> int: pass
def f(x: bytearray) -> int: pass
[out]
_program.py:2: error: No overload variant of "f" matches argument types [builtins.int]
_program.py:5: error: No overload variant of "f" matches argument types [builtins.bytes]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there's no overload for bytes, and there's no equivalency between bytes and bytearray. Not sure I like this.


[case testByteArrayStrCompatibility_python2]
def f(x): # type: (str) -> None
def f(x): # type: (bytes) -> None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that bytearray() is compatible with bytes, but not with str, hence the change here.

pass
f(bytearray('foo'))
f(bytearray(b'foo'))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument to the bytearray() constructor must be bytes. That's reasonable.

@@ -1,4 +1,4 @@
flake8
typed-ast
typed-ast>=0.6.1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version has the 'has_b' flag for Str objects.

@@ -1 +1 @@
Subproject commit aa549db5e5e57ee2702899d1cc660163b52171ed
Subproject commit 455f8aa834ee7fc6b1527cce0f838d4829a4e1d3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling in the latest strbyt branck from typeshed (python/typeshed#580).

@gvanrossum
Copy link
Member Author

I've got a feeling we're not going to make progress with this topic. I'm also not sure that it's important (maybe I'll pick it up again once I start looking at some big project to port some Python 2 code to Python 3, but that looks a ways off).

So I'm closing this, to keep our list of open PRs short.

@gvanrossum gvanrossum closed this Nov 9, 2016
@gvanrossum gvanrossum deleted the strbyt branch November 15, 2016 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant