-
-
Notifications
You must be signed in to change notification settings - Fork 3k
[WIP] bytes/str/unicode #2203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] bytes/str/unicode #2203
Changes from all commits
1974c66
bfa0bac
44f5c95
71a9461
d5aea06
86e7da9
8c69076
cd6ca21
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -236,7 +236,8 @@ def translate_module_id(self, id: str) -> str: | |
""" | ||
if id == self.custom_typing_module: | ||
return 'typing' | ||
elif id == '__builtin__' and self.pyversion[0] == 2: | ||
elif id == '__builtin__': | ||
assert self.pyversion[0] == 2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same story as for fastparse2.py. |
||
# HACK: __builtin__ in Python 2 is aliases to builtins. However, the implementation | ||
# is named __builtin__.py (there is another layer of translation elsewhere). | ||
return 'builtins' | ||
|
@@ -819,11 +820,16 @@ def visit_Str(self, s: ast27.Str) -> Node: | |
contents = str(n)[2:-1] | ||
|
||
if self.pyversion[0] >= 3: | ||
assert False # Shouldn't get here with an ast27.Str | ||
return BytesExpr(contents) | ||
else: | ||
return StrExpr(contents) | ||
if s.has_b: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe(*) this block is the full extent of the changes needed in mypy to differentiate between And if typeshed defines bytes as an alias for str the treatment of StrExpr and BytesExpr is identical -- however the intent is for typeshed to make bytes and str separate classes, and then it's important that the inferred type of (*) Well there may be a few more places where a hardcoded check for builtins.str is used that will have to also allow builtins.bytes, but in many places builtins.str is actually still the only type to be treated so. E.g. |
||
return BytesExpr(contents) | ||
else: | ||
return StrExpr(contents) | ||
else: | ||
if self.pyversion[0] >= 3 or self.is_stub: | ||
assert False # Shouldn't get here with an ast27.Str | ||
return StrExpr(s.s) | ||
else: | ||
return UnicodeExpr(s.s) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1007,9 +1007,12 @@ def accept(self, visitor: NodeVisitor[T]) -> T: | |
|
||
# How mypy uses StrExpr, BytesExpr, and UnicodeExpr: | ||
# In Python 2 mode: | ||
# b'x', 'x' -> StrExpr | ||
# b'x' -> BytesExpr [new!] | ||
# 'x' -> StrExpr | ||
# u'x' -> UnicodeExpr | ||
# BytesExpr is unused | ||
# However after `from __future__ import unicode_literals` [also new!]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Previously, with |
||
# b'x' -> BytesExpr | ||
# 'x', u'x' -> UnicodeExpr | ||
# | ||
# In Python 3 mode: | ||
# b'x' -> BytesExpr | ||
|
@@ -1033,7 +1036,7 @@ def accept(self, visitor: NodeVisitor[T]) -> T: | |
class BytesExpr(Expression): | ||
"""Bytes literal""" | ||
|
||
value = '' # TODO use bytes | ||
value = '' | ||
literal = LITERAL_YES | ||
|
||
def __init__(self, value: str) -> None: | ||
|
@@ -1047,7 +1050,7 @@ def accept(self, visitor: NodeVisitor[T]) -> T: | |
class UnicodeExpr(Expression): | ||
"""Unicode literal (Python 2.x)""" | ||
|
||
value = '' # TODO use bytes | ||
value = '' | ||
literal = LITERAL_YES | ||
|
||
def __init__(self, value: str) -> None: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -62,6 +62,7 @@ def test_python_evaluation(testcase): | |
interpreter = python3_path | ||
args = [] | ||
py2 = False | ||
args.append('--fast-parser') # Some tests require this now. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a bit of a problem, because it means that these tests can't be run on Windows any more (until typed_ast is finally ported there -- I think @ddfisher mentioned he was making progress so hopefully I can stop worrying about this). The checker tests (testcheck.py) allow specifying this flag per testcase, but these eval tests don't have that feature. |
||
args.append('--tb') # Show traceback on crash. | ||
# Write the program to a file. | ||
program = '_program.py' | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
-- Test cases for bytes/str/unicode in Python 2.x. | ||
|
||
[case testBytesStrUnicodeBasics] | ||
# flags: --hide-error-context --fast-parser | ||
def needs_bytes(b): | ||
# type: (bytes) -> None | ||
needs_bytes(b) | ||
needs_str(b) # E: Argument 1 to "needs_str" has incompatible type "bytes"; expected "str" | ||
needs_unicode(b) # E: Argument 1 to "needs_unicode" has incompatible type "bytes"; expected "unicode" | ||
|
||
def needs_str(s): | ||
# type: (str) -> None | ||
# TODO: The following line should not be an error | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note this TODO. I want mypy to auto-coerce str to either unicode or bytes depending on context. |
||
needs_bytes(s) # E: Argument 1 to "needs_bytes" has incompatible type "str"; expected "bytes" | ||
needs_str(s) | ||
needs_unicode(s) | ||
|
||
def needs_unicode(u): | ||
# type: (unicode) -> None | ||
needs_bytes(u) # E: Argument 1 to "needs_bytes" has incompatible type "unicode"; expected "bytes" | ||
needs_str(u) # E: Argument 1 to "needs_str" has incompatible type "unicode"; expected "str" | ||
needs_unicode(u) | ||
[builtins_py2 fixtures/bytes.pyi] | ||
|
||
[case testBytesStrUnicodeLiterals] | ||
# flags: --fast-parser | ||
def needs_bytes(b): | ||
# type: (bytes) -> None | ||
pass | ||
def needs_str(s): | ||
# type: (str) -> None | ||
pass | ||
def needs_unicode(u): | ||
# type: (unicode) -> None | ||
pass | ||
|
||
needs_bytes(b'x') | ||
# TODO: The following line should not be an error | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same for this TODO. |
||
needs_bytes('x') # E: Argument 1 to "needs_bytes" has incompatible type "str"; expected "bytes" | ||
needs_bytes(u'x') # E: Argument 1 to "needs_bytes" has incompatible type "unicode"; expected "bytes" | ||
|
||
needs_str(b'x') # E: Argument 1 to "needs_str" has incompatible type "bytes"; expected "str" | ||
needs_str('x') | ||
needs_str(u'x') # E: Argument 1 to "needs_str" has incompatible type "unicode"; expected "str" | ||
|
||
needs_unicode(b'x') # E: Argument 1 to "needs_unicode" has incompatible type "bytes"; expected "unicode" | ||
needs_unicode('x') | ||
needs_unicode(u'x') | ||
[builtins_py2 fixtures/bytes.pyi] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
class Any: pass | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the builtins module used by the tests in check-bytes-str-unicode-python2.test. |
||
|
||
class object: | ||
def __init__(self) -> None: pass | ||
class type: | ||
def __init__(self, x: Any) -> None: pass | ||
class bytes: | ||
pass | ||
class str: | ||
pass | ||
class unicode: | ||
pass | ||
Text = unicode | ||
NativeStr = str | ||
|
||
class int: pass | ||
class float: pass | ||
class tuple: pass | ||
class function: pass | ||
class ellipsis: pass |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -79,14 +79,18 @@ x = 1.5 | |
[case testAnyStr_python2] | ||
from typing import AnyStr | ||
def f(x): # type: (AnyStr) -> AnyStr | ||
if isinstance(x, str): | ||
if isinstance(x, bytes): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is necessary because AnyStr now has three possible types (bytes, str, unicode). Note that I must return 'foo' for bytes and str, because at runtime they are still indistinguishable! (And these tests actually run the Python code!) |
||
return b'foo' | ||
elif isinstance(x, str): | ||
return 'foo' | ||
else: | ||
return u'zar' | ||
print f('') | ||
print f(b'') | ||
print f(u'') | ||
[out] | ||
foo | ||
foo | ||
zar | ||
|
||
[case testGenericPatterns_python2] | ||
|
@@ -152,13 +156,16 @@ f(**params) | |
|
||
[case testFromFutureImportUnicodeLiterals2_python2] | ||
from __future__ import unicode_literals | ||
def f(x: str) -> None: pass | ||
def f(x): | ||
# type: (str) -> None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This (like a few others) was needed because we're now running these tests with |
||
pass | ||
f(b'') | ||
f(u'') | ||
f('') | ||
[out] | ||
_program.py:4: error: Argument 1 to "f" has incompatible type "unicode"; expected "str" | ||
_program.py:5: error: Argument 1 to "f" has incompatible type "unicode"; expected "str" | ||
_program.py:5: error: Argument 1 to "f" has incompatible type "bytes"; expected "str" | ||
_program.py:6: error: Argument 1 to "f" has incompatible type "unicode"; expected "str" | ||
_program.py:7: error: Argument 1 to "f" has incompatible type "unicode"; expected "str" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that all three calls are now in error! bytes is not compatible with str, and |
||
|
||
[case testStrUnicodeCompatibility_python2] | ||
import typing | ||
|
@@ -233,7 +240,7 @@ u'\x89' | |
import typing | ||
import io | ||
c = io.BytesIO() | ||
c.write('\x89') | ||
c.write(b'\x89') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This one feels reasonable. |
||
print(repr(c.getvalue())) | ||
[out] | ||
'\x89' | ||
|
@@ -396,11 +403,12 @@ def f(x: unicode) -> int: pass | |
def f(x: bytearray) -> int: pass | ||
[out] | ||
_program.py:2: error: No overload variant of "f" matches argument types [builtins.int] | ||
_program.py:5: error: No overload variant of "f" matches argument types [builtins.bytes] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because there's no overload for bytes, and there's no equivalency between bytes and bytearray. Not sure I like this. |
||
|
||
[case testByteArrayStrCompatibility_python2] | ||
def f(x): # type: (str) -> None | ||
def f(x): # type: (bytes) -> None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that bytearray() is compatible with bytes, but not with str, hence the change here. |
||
pass | ||
f(bytearray('foo')) | ||
f(bytearray(b'foo')) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The argument to the bytearray() constructor must be bytes. That's reasonable. |
||
|
||
[case testAbstractProperty_python2] | ||
from abc import abstractproperty, ABCMeta | ||
|
@@ -465,7 +473,8 @@ re.subn(upat, lambda m: u'', u'')[0] + u'' | |
|
||
[case testYieldRegressionTypingAwaitable_python2] | ||
# Make sure we don't reference typing.Awaitable in Python 2 mode. | ||
def g() -> int: | ||
def g(): | ||
# type: () -> int | ||
yield | ||
[out] | ||
_program.py: note: In function "g": | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
flake8 | ||
typed-ast | ||
typed-ast>=0.6.1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This version has the 'has_b' flag for Str objects. |
||
pytest>=2.8 | ||
pytest-xdist>=1.13 |
+105 −15 | stdlib/2.7/__builtin__.pyi | |
+11 −11 | stdlib/2.7/io.pyi | |
+18 −18 | stdlib/2.7/re.pyi | |
+2 −2 | stdlib/2.7/typing.pyi | |
+2 −1 | stdlib/2and3/logging/__init__.pyi | |
+2 −1 | stdlib/3.4/enum.pyi | |
+68 −0 | stdlib/3/http/__init__.pyi | |
+2 −1 | third_party/3/typed_ast/ast35.pyi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These asserts are just for my peace of mind -- I think when @Michael0x2a forked fastparse.py into fastparse2.py, he left some version-checking code in that could be reduced because we know it's always Python 2. (For stubs and type comments, the Python 3 syntax is always parsed with fastparse.py.)