You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support for byte and unicode Literal strings (#6087)
This pull request adds support for byte and unicode Literal strings.
I left in some comments explaining some nuances of the implementation;
here are a few additional meta-notes:
1. I reworded several of the comments suggesting that the way we
represent bytes as a string is a "hack" or that we should eventually
switch to representing bytes as literally bytes.
I started with that approach but ultimately rejected it: I ended
up having to constantly serialize/deserialize between bytes and
strings, which I felt complicated the code.
As a result, I decided that the solution we had previously is in
fact, from a high-level perspective, the best possible approach.
(The actual code for translating the output of `typed_ast` into a
human-readable string *is* admittedly a bit hacky though.)
In any case, the phrase "how mypy currently parses the contents of bytes
literals" is severely out-of-date anyways. That comment was added
about 3 years ago, when we were adding the fast parser for the first
time and running it concurrently with the actual parser.
2. I removed the `is_stub` field from `fastparse2.ASTConverter`: it turned
out we were just never using that field.
3. One complication I ran into was figuring out how to handle forward
references to literal strings. For example, suppose we have the type
`List["Literal['foo']"]`. Do we treat this as being equivalent to
`List[Literal[u'foo']]` or `List[Literal[b'foo']]`?
If this is a Python 3 file or a Python 2 file with
`unicode_literals`, we'd want to pick the former. If this is a
standard Python 2 file, we'd want to pick the latter.
In order to make this happen, I decided to use a heuristic where the
type of the "outer" string decides the type of the "inner" string.
For example:
- In Python 3, `"Literal['foo']"` is a unicode string. So,
the inner `Literal['foo']` will be treated as the same as
`Literal[u'foo']`.
- The same thing happens when using Python 2 with
`unicode_literals`.
- In Python 3, it is illegal to use a byte string as a forward
reference. So, types like `List[b"Literal['foo']"]` are already
illegal.
- In standard Python 2, `"Literal['foo']"` is a byte string. So the
inner `Literal['foo']` will be treated as the same as
`Literal[u'foo']`.
4. I will add tests validating that all of this stuff works as expected
with incremental and fine-grained mode in a separate diff --
probably after fixing and landing #6075,
which I intend to use as a baseline foundation.
0 commit comments