Skip to content

[WIP] bytes/str/unicode #2203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions mypy/fastparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,7 @@ def translate_module_id(self, id: str) -> str:
if id == self.custom_typing_module:
return 'typing'
elif id == '__builtin__' and self.pyversion[0] == 2:
assert False # Shouldn't get here
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These asserts are just for my peace of mind -- I think when @Michael0x2a forked fastparse.py into fastparse2.py, he left some version-checking code in that could be reduced because we know it's always Python 2. (For stubs and type comments, the Python 3 syntax is always parsed with fastparse.py.)

# HACK: __builtin__ in Python 2 is aliases to builtins. However, the implementation
# is named __builtin__.py (there is another layer of translation elsewhere).
return 'builtins'
Expand Down Expand Up @@ -744,6 +745,7 @@ def visit_Str(self, n: ast35.Str) -> Node:
# unicode.
return StrExpr(n.s)
else:
assert False # Shouldn't get here with an ast35.Str
return UnicodeExpr(n.s)

# Bytes(bytes s)
Expand All @@ -756,6 +758,7 @@ def visit_Bytes(self, n: ast35.Bytes) -> Node:
if self.pyversion[0] >= 3:
return BytesExpr(contents)
else:
assert False # Shouldn't get here with an ast35.Str
return StrExpr(contents)

# NameConstant(singleton value)
Expand Down
10 changes: 8 additions & 2 deletions mypy/fastparse2.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,8 @@ def translate_module_id(self, id: str) -> str:
"""
if id == self.custom_typing_module:
return 'typing'
elif id == '__builtin__' and self.pyversion[0] == 2:
elif id == '__builtin__':
assert self.pyversion[0] == 2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same story as for fastparse2.py.

# HACK: __builtin__ in Python 2 is aliases to builtins. However, the implementation
# is named __builtin__.py (there is another layer of translation elsewhere).
return 'builtins'
Expand Down Expand Up @@ -819,11 +820,16 @@ def visit_Str(self, s: ast27.Str) -> Node:
contents = str(n)[2:-1]

if self.pyversion[0] >= 3:
assert False # Shouldn't get here with an ast27.Str
return BytesExpr(contents)
else:
return StrExpr(contents)
if s.has_b:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe(*) this block is the full extent of the changes needed in mypy to differentiate between b'' and '' -- all the heavy lifting is done by python/typed_ast#17 (thanks @ilevkivskyi !).

And if typeshed defines bytes as an alias for str the treatment of StrExpr and BytesExpr is identical -- however the intent is for typeshed to make bytes and str separate classes, and then it's important that the inferred type of b'' is bytes while that of '' is str.

(*) Well there may be a few more places where a hardcoded check for builtins.str is used that will have to also allow builtins.bytes, but in many places builtins.str is actually still the only type to be treated so. E.g. dict(x=1) always has str keys, never bytes keys.

return BytesExpr(contents)
else:
return StrExpr(contents)
else:
if self.pyversion[0] >= 3 or self.is_stub:
assert False # Shouldn't get here with an ast27.Str
return StrExpr(s.s)
else:
return UnicodeExpr(s.s)
Expand Down
11 changes: 7 additions & 4 deletions mypy/nodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -1007,9 +1007,12 @@ def accept(self, visitor: NodeVisitor[T]) -> T:

# How mypy uses StrExpr, BytesExpr, and UnicodeExpr:
# In Python 2 mode:
# b'x', 'x' -> StrExpr
# b'x' -> BytesExpr [new!]
# 'x' -> StrExpr
# u'x' -> UnicodeExpr
# BytesExpr is unused
# However after `from __future__ import unicode_literals` [also new!]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, with unicode_literals mypy would assume b'' was a unicode literal too! (Incorrectly, of course.)

# b'x' -> BytesExpr
# 'x', u'x' -> UnicodeExpr
#
# In Python 3 mode:
# b'x' -> BytesExpr
Expand All @@ -1033,7 +1036,7 @@ def accept(self, visitor: NodeVisitor[T]) -> T:
class BytesExpr(Expression):
"""Bytes literal"""

value = '' # TODO use bytes
value = ''
literal = LITERAL_YES

def __init__(self, value: str) -> None:
Expand All @@ -1047,7 +1050,7 @@ def accept(self, visitor: NodeVisitor[T]) -> T:
class UnicodeExpr(Expression):
"""Unicode literal (Python 2.x)"""

value = '' # TODO use bytes
value = ''
literal = LITERAL_YES

def __init__(self, value: str) -> None:
Expand Down
2 changes: 1 addition & 1 deletion mypy/semanal.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@
TYPE_PROMOTIONS_PYTHON2 = TYPE_PROMOTIONS.copy()
TYPE_PROMOTIONS_PYTHON2.update({
'builtins.str': 'builtins.unicode',
'builtins.bytearray': 'builtins.str',
'builtins.bytearray': 'builtins.bytes',
})

# When analyzing a function, should we analyze the whole function in one go, or
Expand Down
1 change: 1 addition & 0 deletions mypy/test/testcheck.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
# List of files that contain test case descriptions.
files = [
'check-basic.test',
'check-bytes-str-unicode-python2.test',
'check-classes.test',
'check-expressions.test',
'check-statements.test',
Expand Down
1 change: 1 addition & 0 deletions mypy/test/testpythoneval.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ def test_python_evaluation(testcase):
interpreter = python3_path
args = []
py2 = False
args.append('--fast-parser') # Some tests require this now.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a problem, because it means that these tests can't be run on Windows any more (until typed_ast is finally ported there -- I think @ddfisher mentioned he was making progress so hopefully I can stop worrying about this). The checker tests (testcheck.py) allow specifying this flag per testcase, but these eval tests don't have that feature.

args.append('--tb') # Show traceback on crash.
# Write the program to a file.
program = '_program.py'
Expand Down
49 changes: 49 additions & 0 deletions test-data/unit/check-bytes-str-unicode-python2.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
-- Test cases for bytes/str/unicode in Python 2.x.

[case testBytesStrUnicodeBasics]
# flags: --hide-error-context --fast-parser
def needs_bytes(b):
# type: (bytes) -> None
needs_bytes(b)
needs_str(b) # E: Argument 1 to "needs_str" has incompatible type "bytes"; expected "str"
needs_unicode(b) # E: Argument 1 to "needs_unicode" has incompatible type "bytes"; expected "unicode"

def needs_str(s):
# type: (str) -> None
# TODO: The following line should not be an error
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this TODO. I want mypy to auto-coerce str to either unicode or bytes depending on context.

needs_bytes(s) # E: Argument 1 to "needs_bytes" has incompatible type "str"; expected "bytes"
needs_str(s)
needs_unicode(s)

def needs_unicode(u):
# type: (unicode) -> None
needs_bytes(u) # E: Argument 1 to "needs_bytes" has incompatible type "unicode"; expected "bytes"
needs_str(u) # E: Argument 1 to "needs_str" has incompatible type "unicode"; expected "str"
needs_unicode(u)
[builtins_py2 fixtures/bytes.pyi]

[case testBytesStrUnicodeLiterals]
# flags: --fast-parser
def needs_bytes(b):
# type: (bytes) -> None
pass
def needs_str(s):
# type: (str) -> None
pass
def needs_unicode(u):
# type: (unicode) -> None
pass

needs_bytes(b'x')
# TODO: The following line should not be an error
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this TODO.

needs_bytes('x') # E: Argument 1 to "needs_bytes" has incompatible type "str"; expected "bytes"
needs_bytes(u'x') # E: Argument 1 to "needs_bytes" has incompatible type "unicode"; expected "bytes"

needs_str(b'x') # E: Argument 1 to "needs_str" has incompatible type "bytes"; expected "str"
needs_str('x')
needs_str(u'x') # E: Argument 1 to "needs_str" has incompatible type "unicode"; expected "str"

needs_unicode(b'x') # E: Argument 1 to "needs_unicode" has incompatible type "bytes"; expected "unicode"
needs_unicode('x')
needs_unicode(u'x')
[builtins_py2 fixtures/bytes.pyi]
20 changes: 20 additions & 0 deletions test-data/unit/fixtures/bytes.pyi
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
class Any: pass
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the builtins module used by the tests in check-bytes-str-unicode-python2.test.


class object:
def __init__(self) -> None: pass
class type:
def __init__(self, x: Any) -> None: pass
class bytes:
pass
class str:
pass
class unicode:
pass
Text = unicode
NativeStr = str

class int: pass
class float: pass
class tuple: pass
class function: pass
class ellipsis: pass
25 changes: 17 additions & 8 deletions test-data/unit/python2eval.test
Original file line number Diff line number Diff line change
Expand Up @@ -79,14 +79,18 @@ x = 1.5
[case testAnyStr_python2]
from typing import AnyStr
def f(x): # type: (AnyStr) -> AnyStr
if isinstance(x, str):
if isinstance(x, bytes):
Copy link
Member Author

@gvanrossum gvanrossum Oct 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary because AnyStr now has three possible types (bytes, str, unicode). Note that I must return 'foo' for bytes and str, because at runtime they are still indistinguishable! (And these tests actually run the Python code!)

return b'foo'
elif isinstance(x, str):
return 'foo'
else:
return u'zar'
print f('')
print f(b'')
print f(u'')
[out]
foo
foo
zar

[case testGenericPatterns_python2]
Expand Down Expand Up @@ -152,13 +156,16 @@ f(**params)

[case testFromFutureImportUnicodeLiterals2_python2]
from __future__ import unicode_literals
def f(x: str) -> None: pass
def f(x):
# type: (str) -> None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (like a few others) was needed because we're now running these tests with --fast-parser, and the fast PY2 parser rejects PY3-style annotations!

pass
f(b'')
f(u'')
f('')
[out]
_program.py:4: error: Argument 1 to "f" has incompatible type "unicode"; expected "str"
_program.py:5: error: Argument 1 to "f" has incompatible type "unicode"; expected "str"
_program.py:5: error: Argument 1 to "f" has incompatible type "bytes"; expected "str"
_program.py:6: error: Argument 1 to "f" has incompatible type "unicode"; expected "str"
_program.py:7: error: Argument 1 to "f" has incompatible type "unicode"; expected "str"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that all three calls are now in error! bytes is not compatible with str, and '' becomes unicode because of the unicode_literals.


[case testStrUnicodeCompatibility_python2]
import typing
Expand Down Expand Up @@ -233,7 +240,7 @@ u'\x89'
import typing
import io
c = io.BytesIO()
c.write('\x89')
c.write(b'\x89')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one feels reasonable.

print(repr(c.getvalue()))
[out]
'\x89'
Expand Down Expand Up @@ -396,11 +403,12 @@ def f(x: unicode) -> int: pass
def f(x: bytearray) -> int: pass
[out]
_program.py:2: error: No overload variant of "f" matches argument types [builtins.int]
_program.py:5: error: No overload variant of "f" matches argument types [builtins.bytes]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there's no overload for bytes, and there's no equivalency between bytes and bytearray. Not sure I like this.


[case testByteArrayStrCompatibility_python2]
def f(x): # type: (str) -> None
def f(x): # type: (bytes) -> None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that bytearray() is compatible with bytes, but not with str, hence the change here.

pass
f(bytearray('foo'))
f(bytearray(b'foo'))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument to the bytearray() constructor must be bytes. That's reasonable.


[case testAbstractProperty_python2]
from abc import abstractproperty, ABCMeta
Expand Down Expand Up @@ -465,7 +473,8 @@ re.subn(upat, lambda m: u'', u'')[0] + u''

[case testYieldRegressionTypingAwaitable_python2]
# Make sure we don't reference typing.Awaitable in Python 2 mode.
def g() -> int:
def g():
# type: () -> int
yield
[out]
_program.py: note: In function "g":
Expand Down
2 changes: 1 addition & 1 deletion test-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
flake8
typed-ast
typed-ast>=0.6.1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version has the 'has_b' flag for Str objects.

pytest>=2.8
pytest-xdist>=1.13