[WIP] bytes/str/unicode #2203

gvanrossum · 2016-10-01T01:29:56Z

This has no effect as long as bytes remains an alias for str, but it enables experiments with differentiating them for better handling of bytes/str/union. Depends on python/typed_ast#17.

gvanrossum

Here's a running commentary on the changes I had to make to make things work, including some open issues.

Note that we're not even touching NativeStr -- we're just introducing a separation between str and bytes.

gvanrossum · 2016-10-01T02:13:22Z

mypy/fastparse.py

@@ -219,6 +219,7 @@ def translate_module_id(self, id: str) -> str:
        if id == self.custom_typing_module:
            return 'typing'
        elif id == '__builtin__' and self.pyversion[0] == 2:
+            assert False  # Shouldn't get here


These asserts are just for my peace of mind -- I think when @Michael0x2a forked fastparse.py into fastparse2.py, he left some version-checking code in that could be reduced because we know it's always Python 2. (For stubs and type comments, the Python 3 syntax is always parsed with fastparse.py.)

gvanrossum · 2016-10-01T02:13:45Z

mypy/fastparse2.py

@@ -236,7 +236,8 @@ def translate_module_id(self, id: str) -> str:
        """
        if id == self.custom_typing_module:
            return 'typing'
-        elif id == '__builtin__' and self.pyversion[0] == 2:
+        elif id == '__builtin__':
+            assert self.pyversion[0] == 2


Same story as for fastparse2.py.

gvanrossum · 2016-10-01T02:18:54Z

mypy/fastparse2.py

                return BytesExpr(contents)
            else:
-                return StrExpr(contents)
+                if s.has_b:


I believe(*) this block is the full extent of the changes needed in mypy to differentiate between b'' and '' -- all the heavy lifting is done by python/typed_ast#17 (thanks @ilevkivskyi !).

And if typeshed defines bytes as an alias for str the treatment of StrExpr and BytesExpr is identical -- however the intent is for typeshed to make bytes and str separate classes, and then it's important that the inferred type of b'' is bytes while that of '' is str.

(*) Well there may be a few more places where a hardcoded check for builtins.str is used that will have to also allow builtins.bytes, but in many places builtins.str is actually still the only type to be treated so. E.g. dict(x=1) always has str keys, never bytes keys.

gvanrossum · 2016-10-01T02:19:52Z

mypy/nodes.py

 # u'x' -> UnicodeExpr
-# BytesExpr is unused
+# However after `from __future__ import unicode_literals` [also new!]:


Previously, with unicode_literals mypy would assume b'' was a unicode literal too! (Incorrectly, of course.)

gvanrossum · 2016-10-01T02:21:58Z

mypy/test/testpythoneval.py

@@ -62,6 +62,7 @@ def test_python_evaluation(testcase):
        interpreter = python3_path
        args = []
        py2 = False
+    args.append('--fast-parser')  # Some tests require this now.


This is a bit of a problem, because it means that these tests can't be run on Windows any more (until typed_ast is finally ported there -- I think @ddfisher mentioned he was making progress so hopefully I can stop worrying about this). The checker tests (testcheck.py) allow specifying this flag per testcase, but these eval tests don't have that feature.

gvanrossum · 2016-10-01T02:28:42Z

test-data/unit/python2eval.test

@@ -396,11 +403,12 @@ def f(x: unicode) -> int: pass
 def f(x: bytearray) -> int: pass
 [out]
 _program.py:2: error: No overload variant of "f" matches argument types [builtins.int]
+_program.py:5: error: No overload variant of "f" matches argument types [builtins.bytes]


Because there's no overload for bytes, and there's no equivalency between bytes and bytearray. Not sure I like this.

gvanrossum · 2016-10-01T02:29:38Z

test-data/unit/python2eval.test


 [case testByteArrayStrCompatibility_python2]
-def f(x): # type: (str) -> None
+def f(x): # type: (bytes) -> None


Note that bytearray() is compatible with bytes, but not with str, hence the change here.

gvanrossum · 2016-10-01T02:30:00Z

test-data/unit/python2eval.test

    pass
-f(bytearray('foo'))
+f(bytearray(b'foo'))


The argument to the bytearray() constructor must be bytes. That's reasonable.

gvanrossum · 2016-10-01T02:30:39Z

test-requirements.txt

@@ -1,4 +1,4 @@
 flake8
-typed-ast
+typed-ast>=0.6.1


This version has the 'has_b' flag for Str objects.

gvanrossum · 2016-10-01T02:31:13Z

typeshed

@@ -1 +1 @@
-Subproject commit aa549db5e5e57ee2702899d1cc660163b52171ed
+Subproject commit 455f8aa834ee7fc6b1527cce0f838d4829a4e1d3


Pulling in the latest strbyt branck from typeshed (python/typeshed#580).

gvanrossum · 2016-11-09T00:13:33Z

I've got a feeling we're not going to make progress with this topic. I'm also not sure that it's important (maybe I'll pick it up again once I start looking at some big project to port some Python 2 code to Python 3, but that looks a ways off).

So I'm closing this, to keep our list of open PRs short.

Guido van Rossum added 8 commits September 30, 2016 17:47

Distinguish b'' and '' in PY2.

1974c66

This has no effect as long as bytes remains an alias for str, but it enables experiments with differentiating them for better handling of bytes/str/union. Depends on python/typed_ast#17.

Update the version of typed_ast we need (0.6.1).

bfa0bac

Add the first bytes/str/unicode tests

44f5c95

Fix python2eval.test

71a9461

Sync typeshed again

d5aea06

Fix test some more

86e7da9

Fix all python2eval tests

8c69076

Sync typeshed some more

cd6ca21

gvanrossum mentioned this pull request Oct 1, 2016

Decide how to handle str/unicode python/typing#208

Closed

gvanrossum commented Oct 1, 2016

View reviewed changes

gvanrossum closed this Nov 9, 2016

gvanrossum deleted the strbyt branch November 15, 2016 04:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] bytes/str/unicode #2203

[WIP] bytes/str/unicode #2203

Uh oh!

gvanrossum commented Oct 1, 2016

Uh oh!

gvanrossum left a comment

Uh oh!

gvanrossum Oct 1, 2016

Uh oh!

gvanrossum Oct 1, 2016

Uh oh!

gvanrossum Oct 1, 2016

Uh oh!

gvanrossum Oct 1, 2016

Uh oh!

gvanrossum Oct 1, 2016

Uh oh!

gvanrossum Oct 1, 2016

Uh oh!

gvanrossum Oct 1, 2016

Uh oh!

gvanrossum Oct 1, 2016

Uh oh!

gvanrossum Oct 1, 2016

Uh oh!

gvanrossum Oct 1, 2016

Uh oh!

gvanrossum commented Nov 9, 2016

Uh oh!

Uh oh!

		@@ -1 +1 @@
		Subproject commit aa549db5e5e57ee2702899d1cc660163b52171ed
		Subproject commit 455f8aa834ee7fc6b1527cce0f838d4829a4e1d3

Uh oh!

[WIP] bytes/str/unicode #2203

[WIP] bytes/str/unicode #2203

Uh oh!

Conversation

gvanrossum commented Oct 1, 2016

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gvanrossum commented Nov 9, 2016

Uh oh!

Uh oh!