Skip to content

Text, on Python 2, includes bytes? #418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matthiaskramm opened this issue Apr 27, 2017 · 14 comments
Closed

Text, on Python 2, includes bytes? #418

matthiaskramm opened this issue Apr 27, 2017 · 14 comments

Comments

@matthiaskramm
Copy link
Contributor

matthiaskramm commented Apr 27, 2017

The combination of these two rules means that an annotation Text now includes bytes, on Python 2.

And indeed, the below type-checks, with both pytype and mypy --python-version=2.7:

import typing
def f(x):
  # (typing.Text) -> None
  pass
f(b"foo")

This seems a bit odd to me. It also means that we don't have a cross-version way to express "unicode string". Should Text not do the auto-promotion, so that it only stands for unicode itself, on Python 2?

@vlasovskikh
Copy link
Member

@matthiaskramm There was #208 with lots of ideas, none of them were found satisfactory.

@matthiaskramm
Copy link
Contributor Author

Right, so Jukka's proposal #208 (comment) mentions something along the same lines:

If a function accepts unicode arguments but no str arguments in Python 2, the Python 2 argument type should probably be Text.

This reads to me as if he was going by the assumption that that's what Text currently means, which makes the current behavior of both mypy and pytype a bug.

@ilevkivskyi
Copy link
Member

By the way, at some point I modified the Python 2 parser in typed_ast to distinguish between "ascii chars" and b"ascii chars". @gvanrossum was trying to use it somehow, but I don't remember what was the outcome.

@gvanrossum
Copy link
Member

By the way, at some point I modified the Python 2 parser in typed_ast to distinguish between "ascii chars" and b"ascii chars". @gvanrossum was trying to use it somehow, but I don't remember what was the outcome.

IIUC that code is still there, but we ended up not using it because things got too complicated.

The situation is still imperfect, but none of the alternatives seems to really solve everything, so we're keeping the status quo. It's perhaps unfortunate that (in mypy's PY2 mode) you can't distinguish between a function that takes only unicode and one that takes either unicode or bytes (== str), because regardless of whether you spell it Text or unicode it'll accept str/bytes instances too. But the situation is pretty rare. More important IMO is that you can express that something doesn't take unicode, by writing bytes (or str, but bytes makes the intent more clear).

@JukkaL
Copy link
Contributor

JukkaL commented Apr 28, 2017

This reads to me as if he was going by the assumption that that's what Text currently means, which makes the current behavior of both mypy and pytype a bug.

No, I was aware of mypy treats unicode/Text. This was just a proposal that never went anywhere.

@matthiaskramm
Copy link
Contributor Author

We just encountered an issue where code like

import io
x = io.StringIO()
x.write("")

silently passed type-checking, even though at runtime, this happens:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unicode argument expected, got 'str'

StringIO.write() is defined as (https://github.com/python/typeshed/blob/master/stdlib/2/_io.pyi#L135):

  def write(self, pbuf: unicode) -> int: ...

and unicode is automatically changed to Union[unicode, bytes].

Should we delete "unicode" from the PEP sentence "For parameters typed as unicode or Text, arguments of type str should be acceptable", to allow us to declare parameters as "pure" unicode?

@vlasovskikh
Copy link
Member

@matthiaskramm I believe it's a minor class of false negative errors we could ignore. In #208 we have discussed several ideas of making str / bytes / unicode checking stricter in Python 2. None of them turned out to be satisfactory.

@matthiaskramm
Copy link
Contributor Author

matthiaskramm commented Jul 7, 2017

Right, I remember #208. But that one discusses a much broader topic.
I'm not interested in gradual byting, or introducing new string types. But I do want to be able to say "only unicode", in pyi files.
I'm not convinced that this class of (EDIT) false negatives is as small as we think it is. It already came up for us, and we've annotated less than 1% of our source base.

@gvanrossum
Copy link
Member

gvanrossum commented Jul 7, 2017 via email

@vlasovskikh
Copy link
Member

@matthiaskramm Ok, got it. You mean "false negatives", don't you? A checker doesn't find the error even though it's present at run-time.

@matthiaskramm
Copy link
Contributor Author

Right, I meant "false negative".

I don't like kicking cans down the road, but how about we change the PEP sentence to not mention unicode, and instead document the conversion unicode -> Union[str, unicode] in typeshed and in the mypy docs? That way, we could at least leave it up to the other type checkers whether they want to be more rigid about unicode or not.

@gvanrossum
Copy link
Member

OK, that sounds like a reasonable compromise. Can you propose a PR for PEP 484?

@ilevkivskyi
Copy link
Member

@gvanrossum

One problem IIUC is that mypy "forgets" aliases so you can't have a type that's an alias for another type but also endow it with some special meaning.

Maybe it was like this but now mypy is quite flexible with aliases, so this can be implemented if needed.

matthiaskramm added a commit to matthiaskramm/peps that referenced this issue Jul 7, 2017
See python/typing#418 (comment).
Let's leave it up to type-checkers whether they want `unicode`, in Python 2
code, to mean "only unicode" or "str or unicode".
gvanrossum pushed a commit to python/peps that referenced this issue Jul 18, 2017
)

See python/typing#418 (comment).
Let's leave it up to type-checkers whether they want `unicode`, in Python 2
code, to mean "only unicode" or "str or unicode".
@srittau
Copy link
Collaborator

srittau commented Apr 23, 2021

I think this has been resolved with additions to PEP 484.

@srittau srittau closed this as completed Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants