From b14e1f16eeab5b0e3e0a2db4c406cb04acb19e21 Mon Sep 17 00:00:00 2001 From: Michael Lee Date: Sat, 16 Jun 2018 11:00:33 -0700 Subject: [PATCH 1/4] Clean up overload docs This pull request: 1. Moves documentation about overloads into the "More types" section. (Rationale: although overloads aren't really a type, I feel it makes sense for them to live somewhere in the "Type System Reference" section instead of the "Miscellaneous" section.) 2. Modifies the docs to start with a non-dunder example (resolves https://github.com/python/mypy/issues/4579) and simplifies the `__dunder__` example. 3. Adds some documentation about the "pick the first match" rule. 4. Removes the now out-of-date note about erasure. --- docs/source/additional_features.rst | 116 ---------------- docs/source/more_types.rst | 205 ++++++++++++++++++++++++++++ 2 files changed, 205 insertions(+), 116 deletions(-) diff --git a/docs/source/additional_features.rst b/docs/source/additional_features.rst index 14b0ce20611c..866385a6c439 100644 --- a/docs/source/additional_features.rst +++ b/docs/source/additional_features.rst @@ -4,122 +4,6 @@ Additional features This section discusses various features that did not fit in naturally in one of the previous sections. -.. _function-overloading: - -Function overloading -******************** - -Sometimes the types in a function depend on each other in ways that -can't be captured with a ``Union``. For example, the ``__getitem__`` -(``[]`` bracket indexing) method can take an integer and return a -single item, or take a ``slice`` and return a ``Sequence`` of items. -You might be tempted to annotate it like so: - -.. code-block:: python - - from typing import Sequence, TypeVar, Union - T = TypeVar('T') - - class MyList(Sequence[T]): - def __getitem__(self, index: Union[int, slice]) -> Union[T, Sequence[T]]: - if isinstance(index, int): - ... # Return a T here - elif isinstance(index, slice): - ... # Return a sequence of Ts here - else: - raise TypeError(...) - -But this is too loose, as it implies that when you pass in an ``int`` -you might sometimes get out a single item and sometimes a sequence. -The return type depends on the parameter type in a way that can't be -expressed using a type variable. Instead, we can use `overloading -`_ -to give the same function multiple type annotations (signatures) and -accurately describe the function's behavior. - -.. code-block:: python - - from typing import overload, Sequence, TypeVar, Union - T = TypeVar('T') - - class MyList(Sequence[T]): - - # The @overload definitions are just for the type checker, - # and overwritten by the real implementation below. - @overload - def __getitem__(self, index: int) -> T: - pass # Don't put code here - - # All overloads and the implementation must be adjacent - # in the source file, and overload order may matter: - # when two overloads may overlap, the more specific one - # should come first. - @overload - def __getitem__(self, index: slice) -> Sequence[T]: - pass # Don't put code here - - # The implementation goes last, without @overload. - # It may or may not have type hints; if it does, - # these are checked against the overload definitions - # as well as against the implementation body. - def __getitem__(self, index: Union[int, slice]) -> Union[T, Sequence[T]]: - # This is exactly the same as before. - if isinstance(index, int): - ... # Return a T here - elif isinstance(index, slice): - ... # Return a sequence of Ts here - else: - raise TypeError(...) - -Calls to overloaded functions are type checked against the variants, -not against the implementation. A call like ``my_list[5]`` would have -type ``T``, not ``Union[T, Sequence[T]]`` because it matches the -first overloaded definition, and ignores the type annotations on the -implementation of ``__getitem__``. The code in the body of the -definition of ``__getitem__`` is checked against the annotations on -the corresponding declaration. In this case the body is checked -with ``index: Union[int, slice]`` and a return type -``Union[T, Sequence[T]]``. If there are no annotations on the -corresponding definition, then code in the function body is not type -checked. - -The annotations on the function body must be compatible with the -types given for the overloaded variants listed above it. The type -checker will verify that all the types for the overloaded variants -are compatible with the types given for the implementation. In this -case it checks that the parameter type ``int`` and the return type -``T`` are compatible with ``Union[int, slice]`` and -``Union[T, Sequence[T]]`` for the first variant. For the second -variant it verifies that the parameter type ``slice`` and the return -type ``Sequence[T]`` are compatible with ``Union[int, slice]`` and -``Union[T, Sequence[T]]``. - -Overloaded function variants are still ordinary Python functions and -they still define a single runtime object. There is no automatic -dispatch happening, and you must manually handle the different types -in the implementation (usually with :func:`isinstance` checks, as -shown in the example). - -The overload variants must be adjacent in the code. This makes code -clearer, as you don't have to hunt for overload variants across the -file. - -Overloads in stub files are exactly the same, except there is no -implementation. - -.. note:: - - As generic type variables are erased at runtime when constructing - instances of generic types, an overloaded function cannot have - variants that only differ in a generic type argument, - e.g. ``List[int]`` and ``List[str]``. - -.. note:: - - If you just need to constrain a type variable to certain types or - subtypes, you can use a :ref:`value restriction - `. - .. _attrs_package: The attrs package diff --git a/docs/source/more_types.rst b/docs/source/more_types.rst index 61e4f4dc7b76..5da421cb694a 100644 --- a/docs/source/more_types.rst +++ b/docs/source/more_types.rst @@ -178,6 +178,211 @@ create subclasses of these objects either. name_by_id(3) # int is not the same as UserId +.. _function-overloading: + +Function overloading +******************** + +Sometimes the arguments and types in a function depend on each other +in ways that can't be captured with a ``Union``. For example, suppose +we want to write a function that produces IP address objects. If we pass +in four ints, we receive an ``IPv4Address`` object. If we pass in eight, +we recieve an ``IPv6Address`` object. + +Our first attempt at writing this function might look like this: + +.. code-block:: python + + from typing import Union + + def ip_address(*components: int) -> Union[IPv4Address, IPv6Address]: + if len(components) == 4: + # Return an IPv4Address object + elif len(components) == 8: + # Return an IPv6Address object + else: + # Raise an exception + +While this function signature works, it's too loose: it implies we +could receive either address object regardless of the number of arguments +we pass in. It also does not prohibit a caller from passing in the wrong +number of ints: mypy would treat calls like ``ip_address(1, 2)`` as being +valid, for example. + +We can do better by using `overloading +`_, +which lets us give the same function multiple type annotations (signatures) +to more accurately describe the function's behavior: + +.. code-block:: python + + from typing import Union, overload + + # Overload *variants* for 'ip_address'. + # These variants give extra information to the type checker. + # They are ignored at runtime. + + @overload + def ip_address(a: int, b: int, c: int, d: int) -> IPv4Address: + pass + + @overload + def ip_address(a: int, b: int, c: int, d: int, + e: int, f: int, g: int, h: int) -> IPv6Adress: + pass + + # The actual *implementation* of 'ip_address'. + # The implementation contains the actual runtime logic. + # + # It may or may not have type hints. If it does, mypy + # will check the body of the implementation against the + # type hints. + # + # Mypy will also check and make sure the signature is + # consistent with the provided variants. + + def ip_address(*components: int) -> Union[IPv4Address, IPv6Address]: + if len(components) == 4: + # Return an IPv4Address object + elif len(components) == 8: + # Return an IPv6Address object + else: + # Raise an exception + +This allows mypy to understand calls to ``ip_address`` much more precisely. +For example, mypy will understand that ``ip_address(127, 0, 0, 1)`` will +always have a return type of ``IPv4Address`` and will report errors for +calls like ``ip_address(1, 2)``. + +As another example, suppose we want to write a custom container class that +implements the ``__getitem__`` method (``[]`` bracket indexing). If this +method receives an integer we return a single item. If it receives a +``slice``, we return a ``Sequence`` of items. + +We can precisely encode this relationship between the argument and the +return type by using overloads like so: + +.. code-block:: python + + from typing import Sequence, TypeVar, Union + + T = TypeVar('T') + + class MyList(Sequence[T]): + @overload + def __getitem__(self, index: int) -> T: pass + + @overload + def __getitem__(self, index: slice) -> Sequence[T]: pass + + def __getitem__(self, index: Union[int, slice]) -> Union[T, Sequence[T]]: + if isinstance(index, int): + # Return a T here + elif isinstance(index, slice): + # Return a sequence of Ts here + else: + raise TypeError(...) + +There are a few additional things to note about using overloads: + +Runtime behavior +---------------- + +An overloaded function must consist of two or more overload *variants* +followed by an *implementation*. The variants and the implementations +must be adjacent in the code: think of them as one indivisible unit. + +The variant bodies must all be empty; only the implementation is allowed +to contain code. This is because at runtime, the variants are completely +ignored: they're overridden by the final implementation function. + +This means that an overloaded function is still an ordinary Python +function! There is no automatic dispatch handling: you must manually +handle the different types in the implementation (usually by using +``if`` statements and ``isinstance`` checks). + +If you are adding an overload within a stub file, the implementation +function should be omitted: stubs do not contain runtime logic. + +Type checking calls to overloads +-------------------------------- + +When you call an overloaded function, mypy will infer the correct +return type using the provided variants. A call is never type checked +against the implementation signature. This is why mypy will report calls +like ``ip_address(4)`` as being invalid even though it matches the +implementation signature. + +If multiple variants end up matching a call, mypy will, for the +most part, select the return type corresponding to the first +matching call. For example, consider the following program: + +.. code-block:: python + + from typing import List, Union, overload + + @overload + def summarize(data: List[str]) -> str: pass + + @overload + def summarize(data: List[int]) -> int: pass + + def summarize(data): + # ...snip... + + # What is the type of 'output'? str or int? + output = summarize([]) + +The ``summarize([])`` call matches both variants: an empty list could +be either a ``List[str]`` or a ``List[int]``. In this case, mypy +will break the tie by picking the first matching variant: ``output`` +will have an inferred type of ``str``. The implementor is responsible +for making sure ``summarize`` breaks ties in the same way at runtime. + +There are a few exceptions to the "pick the first match" rule. +For example, if multiple variants match due to an argument +being of type ``Any``, mypy will make the inferred type also +be ``Any``. + +Mypy will also prohibit you from writing overload variants that are +inherently unsafely overlapping: for example, writing two variants +that accept the same arguments but return different types. + +Type checking the implementation +-------------------------------- + +The body of an implementation is type-checked against the +type hints provided on the implementation. For example, in the +``MyList`` example up above, the code in the body is checked with +``index: Union[int, slice]`` and a return type ``Union[T, Sequence[T]]``. +If there are no annotations on the implementation, then the body is +not type checked. + +The variants must also also be compatible with the implementation +type hints. In the ``MyList`` example, mypy will check that the +parameter type ``int`` and the return type ``T`` are compatible with +``Union[int, slice]`` and ``Union[T, Sequence]`` for the +first variant. For the second variant it verifies the parameter +type ``slice`` and the return type ``Sequence[T]`` are compatible +with ``Union[int, slice]`` and ``Union[T, Sequence]``. + +.. note:: + + Due to the "pick the first match" rule, changing the order of your + overload variants can change how mypy type checks your program. + + To minimize potential issues, we recommend ordering your variants + from most to least specific. Your implementation should also + perform ``isinstance`` checks and the like in the same order + as the listed variants. + +.. note:: + + If you just need to constrain a type variable to certain types or + subtypes, you can use a :ref:`value restriction + `. + + .. _async-and-await: Typing async/await From b08b0cc4913c24556fa426bd4711950df8745b21 Mon Sep 17 00:00:00 2001 From: Michael Lee Date: Thu, 21 Jun 2018 11:15:00 -0700 Subject: [PATCH 2/4] Respond to code review --- docs/source/command_line.rst | 2 + docs/source/more_types.rst | 251 ++++++++++++++++++++++++++++------- 2 files changed, 203 insertions(+), 50 deletions(-) diff --git a/docs/source/command_line.rst b/docs/source/command_line.rst index d4dfad6c28ad..4a1982a46613 100644 --- a/docs/source/command_line.rst +++ b/docs/source/command_line.rst @@ -288,6 +288,8 @@ The following options are available: module (such as ``List[int]`` and ``Dict[str, str]``). +.. _additional-command-line-flags: + Additional command line flags ***************************** diff --git a/docs/source/more_types.rst b/docs/source/more_types.rst index 5da421cb694a..d7fd5660aa58 100644 --- a/docs/source/more_types.rst +++ b/docs/source/more_types.rst @@ -2,7 +2,8 @@ More types ========== This section introduces a few additional kinds of types, including ``NoReturn``, -``NewType``, ``TypedDict``, and types for async code. All of these are only +``NewType``, ``TypedDict``, and types for async code. It also discusses how to +give functions more precise types using overloads. All of these are only situationally useful, so feel free to skip this section and come back when you have a need for some of them. @@ -15,6 +16,10 @@ Here's a quick summary of what's covered here: For example, you can have ``UserId`` as a variant of ``int`` that is just an ``int`` at runtime. +* ``@overload`` lets you define a function that can accept multiple distinct + signatures. This is useful for if you need to encode a relationship between + the arguments and the return type that would be difficult to express normally. + * ``TypedDict`` lets you give precise types for dictionaries that represent objects with a fixed schema, such as ``{'id': 1, 'items': ['x']}``. @@ -185,9 +190,9 @@ Function overloading Sometimes the arguments and types in a function depend on each other in ways that can't be captured with a ``Union``. For example, suppose -we want to write a function that produces IP address objects. If we pass +we want to write a function produces IP address objects. If we pass in four ints, we receive an ``IPv4Address`` object. If we pass in eight, -we recieve an ``IPv6Address`` object. +we receive an ``IPv6Address`` object. Our first attempt at writing this function might look like this: @@ -203,14 +208,14 @@ Our first attempt at writing this function might look like this: else: # Raise an exception -While this function signature works, it's too loose: it implies we -could receive either address object regardless of the number of arguments +While this function signature works, it's too loose: it implies ``ip_address`` +could return either either object regardless of the number of arguments we pass in. It also does not prohibit a caller from passing in the wrong number of ints: mypy would treat calls like ``ip_address(1, 2)`` as being valid, for example. We can do better by using `overloading -`_, +`_ which lets us give the same function multiple type annotations (signatures) to more accurately describe the function's behavior: @@ -221,15 +226,15 @@ to more accurately describe the function's behavior: # Overload *variants* for 'ip_address'. # These variants give extra information to the type checker. # They are ignored at runtime. + # + # Prefixing an argument with two underscores tells mypy that + # those arguments are positional only. @overload - def ip_address(a: int, b: int, c: int, d: int) -> IPv4Address: - pass - + def ip_address(__a: int, __b: int, __c: int, __d: int) -> IPv4Address: ... @overload - def ip_address(a: int, b: int, c: int, d: int, - e: int, f: int, g: int, h: int) -> IPv6Adress: - pass + def ip_address(__a: int, __b: int, __c: int, __d: int, + __e: int, __f: int, __g: int, __h: int) -> IPv6Address: ... # The actual *implementation* of 'ip_address'. # The implementation contains the actual runtime logic. @@ -254,6 +259,12 @@ For example, mypy will understand that ``ip_address(127, 0, 0, 1)`` will always have a return type of ``IPv4Address`` and will report errors for calls like ``ip_address(1, 2)``. +One nuance is that for this particular example, we prefixed each argument +with two underscores to indicate to mypy that they are positional-only. If +we did not do this, mypy would have reported an error: the variants imply +calls like ``ip_address(a=127, b=0, c=0, d=1)`` are legal even though our +implementation is not prepared to accept keyword arguments. + As another example, suppose we want to write a custom container class that implements the ``__getitem__`` method (``[]`` bracket indexing). If this method receives an integer we return a single item. If it receives a @@ -264,16 +275,16 @@ return type by using overloads like so: .. code-block:: python - from typing import Sequence, TypeVar, Union + from typing import Sequence, TypeVar, Union, overload T = TypeVar('T') class MyList(Sequence[T]): @overload - def __getitem__(self, index: int) -> T: pass + def __getitem__(self, index: int) -> T: ... @overload - def __getitem__(self, index: slice) -> Sequence[T]: pass + def __getitem__(self, index: slice) -> Sequence[T]: ... def __getitem__(self, index: Union[int, slice]) -> Union[T, Sequence[T]]: if isinstance(index, int): @@ -283,7 +294,16 @@ return type by using overloads like so: else: raise TypeError(...) -There are a few additional things to note about using overloads: +Note that making ``index`` positional-only is unnecessary in this case: +using ``index`` as a keyword argument and doing ``my_list.__getitem__(index=4)`` +will work at runtime (though it is bad style). + +.. note:: + + If you just need to constrain a type variable to certain types or + subtypes, you can use a :ref:`value restriction + `. + Runtime behavior ---------------- @@ -297,35 +317,41 @@ to contain code. This is because at runtime, the variants are completely ignored: they're overridden by the final implementation function. This means that an overloaded function is still an ordinary Python -function! There is no automatic dispatch handling: you must manually -handle the different types in the implementation (usually by using +function! There is no automatic dispatch handling and you must manually +handle the different types in the implementation (e.g. by using ``if`` statements and ``isinstance`` checks). If you are adding an overload within a stub file, the implementation function should be omitted: stubs do not contain runtime logic. +.. note:: + + While we can leave the variant body empty using the ``pass`` keyword, + the more common convention is to instead use the ellipse (``...``) literal. + Type checking calls to overloads -------------------------------- -When you call an overloaded function, mypy will infer the correct -return type using the provided variants. A call is never type checked -against the implementation signature. This is why mypy will report calls -like ``ip_address(4)`` as being invalid even though it matches the +When you call an overloaded function, mypy will infer the correct return +type by picking the best matching variant, after taking into consideration +both the argument types and arity. However, a call is never type +checked against the implementation. This is why mypy will report calls +like ``ip_address(1, 2)`` as being invalid even though it matches the implementation signature. -If multiple variants end up matching a call, mypy will, for the -most part, select the return type corresponding to the first -matching call. For example, consider the following program: +If there are multiple equally good matching variants, mypy will select +the variant that was defined first. For example, consider the following +program: .. code-block:: python - from typing import List, Union, overload + from typing import List, overload @overload - def summarize(data: List[str]) -> str: pass + def summarize(data: List[str]) -> str: ... @overload - def summarize(data: List[int]) -> int: pass + def summarize(data: List[int]) -> int: ... def summarize(data): # ...snip... @@ -339,14 +365,145 @@ will break the tie by picking the first matching variant: ``output`` will have an inferred type of ``str``. The implementor is responsible for making sure ``summarize`` breaks ties in the same way at runtime. -There are a few exceptions to the "pick the first match" rule. -For example, if multiple variants match due to an argument -being of type ``Any``, mypy will make the inferred type also -be ``Any``. +There are however are two exceptions to the "pick the first match" rule. +First, if multiple variants match due to an argument being of type +``Any``, mypy will make the inferred type also be ``Any``: + +.. code-block:: python + + dynamic_var: Any = some_dynamic_function() + + # output2 is of type 'Any' + output2 = summarize(dynamic_var) + +Second, if multiple variants match due to one or more of the arguments +being a union, mypy will make the inferred type be the union of the +matching variant returns: + +.. code-block:: python + + some_list: Union[List[str], List[int]] + + # output3 is of type 'Union[str, int]' + output3 = summarize(some_list) + +.. note:: + + Due to the "pick the first match" rule, changing the order of your + overload variants can change how mypy type checks your program. + + To minimize potential issues, we recommend ordering your variants + from most to least specific. Your implementation should also + perform runtime checks (e.g. ``isinstance`` checks) in the same + order as the listed variants. + + If your variants have no inherent relationship to each other, they + can naturally be listed in any arbitrary order. + +Type checking the variants +-------------------------- + +Although the "pick the first match" algorithm works well in many cases, it can +sometimes make mypy infer the incorrect type. For example, consider the following +overload definition: + +.. code-block:: python + + from typing import overload, Union + + @overload + def unsafe_func(x: int) -> int: ... + + @overload + def unsafe_func(x: object) -> str: ... + + def unsafe_func(x: object) -> Union[int, str]: + if isinstance(x, int): + return x + else: + return "Unsafe!" + +On the surface, this function definition appears to be fine. However, it will +result in a discrepency between the inferred type and the actual runtime type +when we try using it like so: + +.. code-block:: python + + def should_return_str(x: object) -> str: + return unsafe_func(x) + + bad_var = should_return_str(42) + +If we examine just the annotated types, it seems as if ``bad_var`` ought to be of +type ``str``. But if we examine the runtime behavior, ``bad_var`` will actually be +of type ``int``! + +To prevent these kinds of issues, mypy will detect and prohibit inherently unsafely +overlapping overloads on a best-effort basis. Two variants are considered unsafely +overlapping when both of the following are true: + +1. All of the arguments of the first variant are compatible with the second. +2. The return type of the first variant is *not* compatible with (e.g. is not a + subtype of) the second. + +So in this example, the ``int`` argument in the first variant is a subtype of +the ``object`` argument in the second, yet the ``int`` return type is a subtype of +``str``. Both conditions are true, so mypy will correctly flag ``unsafe_func`` as +being unsafe. + +However, it is unfortunately impossible to detect *all* unsafe overloads without +crippling overloads to the point where they're completely unusable. For example, +suppose we modify the above example so the function is overloaded against two +completely unrelated types: + +.. code-block:: python + + from typing import overload, Union + + class A: pass + class B: pass + + @overload + def func(x: A) -> int: ... + + @overload + def func(x: B) -> str: ... + + def func(x: Union[A, B]) -> Union[int, str]: + if isinstance(x, A): + return 42 + else: + return "foobar" + + def should_return_str(x: B) -> str: + return func(x) + +This program is fine under most normal conditions. However, if we +create a new class that inherits from *both* A and B, we get the +same kind of discrepency we had earlier: + +.. code-block:: python + + class Combined(A, B): pass + + # bad_var is of type 'str' according to mypy but is in reality of type 'int'. + bad_var = should_return_str(Combined()) + +Mypy could prohibit this by flagging the definition of ``func`` +as being unsafe since ``A`` and ``B`` could theoretically overlap +due to multiple inheritance. However, multiple inheritance is +uncommon and making such a change would make it difficult +to use overloads in a useful way. Consequently, mypy is designed to +ignore the possibility of multiple inheritance altogether when +checking overloads: it will *not* report any errors with the above program. + +Thankfully, these types of situations are relatively rare. What this +does mean, however, is that you should exercise caution when using an +overloaded function in scenarios where it can potentially receive values +that are an instance of two seemingly unrelated types. For example, +``Combined()`` is an instance of both ``A`` and ``B``; the empty list +``[]`` is an instance of both ``List[int]`` and ``List[str]``. -Mypy will also prohibit you from writing overload variants that are -inherently unsafely overlapping: for example, writing two variants -that accept the same arguments but return different types. Type checking the implementation -------------------------------- @@ -354,9 +511,11 @@ Type checking the implementation The body of an implementation is type-checked against the type hints provided on the implementation. For example, in the ``MyList`` example up above, the code in the body is checked with -``index: Union[int, slice]`` and a return type ``Union[T, Sequence[T]]``. -If there are no annotations on the implementation, then the body is -not type checked. +argument list ``index: Union[int, slice]`` and a return type of +``Union[T, Sequence[T]]``. If there are no annotations on the +implementation, then the body is not type checked. If you want to +force mypy to check the body anyways, use the ``--check-untyped-defs`` +flag (:ref:`more details here `). The variants must also also be compatible with the implementation type hints. In the ``MyList`` example, mypy will check that the @@ -368,19 +527,11 @@ with ``Union[int, slice]`` and ``Union[T, Sequence]``. .. note:: - Due to the "pick the first match" rule, changing the order of your - overload variants can change how mypy type checks your program. - - To minimize potential issues, we recommend ordering your variants - from most to least specific. Your implementation should also - perform ``isinstance`` checks and the like in the same order - as the listed variants. + The overload semantics documented above are new as of mypy 0.620. -.. note:: - - If you just need to constrain a type variable to certain types or - subtypes, you can use a :ref:`value restriction - `. + Older versions of mypy used different semantics. In particular, overload + variants had to different after erasing types and calls to overloaded + functions did not use the "pick the first match" rule. .. _async-and-await: From d65ce3a3ca416d1feef1919fcd754d7fd934ccc8 Mon Sep 17 00:00:00 2001 From: Michael Lee Date: Sun, 1 Jul 2018 16:46:41 -0700 Subject: [PATCH 3/4] Respond to second code review --- docs/source/more_types.rst | 265 ++++++++++++++++++++----------------- 1 file changed, 147 insertions(+), 118 deletions(-) diff --git a/docs/source/more_types.rst b/docs/source/more_types.rst index d7fd5660aa58..0aab3c015c87 100644 --- a/docs/source/more_types.rst +++ b/docs/source/more_types.rst @@ -190,28 +190,31 @@ Function overloading Sometimes the arguments and types in a function depend on each other in ways that can't be captured with a ``Union``. For example, suppose -we want to write a function produces IP address objects. If we pass -in four ints, we receive an ``IPv4Address`` object. If we pass in eight, -we receive an ``IPv6Address`` object. +we want to write a function that can accept x-y coordinates. If we pass +in just a single x-y coordinate, we return a ``ClickEvent`` object. However, +if we pass in two x-y coordinates, we return a ``DragEvent`` object. Our first attempt at writing this function might look like this: .. code-block:: python - from typing import Union + from typing import Union, Optional - def ip_address(*components: int) -> Union[IPv4Address, IPv6Address]: - if len(components) == 4: - # Return an IPv4Address object - elif len(components) == 8: - # Return an IPv6Address object + def mouse_event(x1: int, + y1: int, + x2: Optional[int] = None, + y2: Optional[int] = None) -> Union[ClickEvent, DragEvent]: + if x2 is None and y2 is None: + return ClickEvent(x1, y1) + elif x2 is not None and y2 is not None: + return DragEvent(x1, y1, x2, y2) else: - # Raise an exception + raise Exception("Bad arguments") -While this function signature works, it's too loose: it implies ``ip_address`` -could return either either object regardless of the number of arguments +While this function signature works, it's too loose: it implies ``mouse_event`` +could return either object regardless of the number of arguments we pass in. It also does not prohibit a caller from passing in the wrong -number of ints: mypy would treat calls like ``ip_address(1, 2)`` as being +number of ints: mypy would treat calls like ``mouse_event(1, 2, 20)`` as being valid, for example. We can do better by using `overloading @@ -223,20 +226,16 @@ to more accurately describe the function's behavior: from typing import Union, overload - # Overload *variants* for 'ip_address'. + # Overload *variants* for 'mouse_event'. # These variants give extra information to the type checker. # They are ignored at runtime. - # - # Prefixing an argument with two underscores tells mypy that - # those arguments are positional only. @overload - def ip_address(__a: int, __b: int, __c: int, __d: int) -> IPv4Address: ... + def mouse_event(x1: int, y1: int) -> ClickEvent: ... @overload - def ip_address(__a: int, __b: int, __c: int, __d: int, - __e: int, __f: int, __g: int, __h: int) -> IPv6Address: ... + def mouse_event(x1: int, y1: int, x2: int, y2: int) -> DragEvent: ... - # The actual *implementation* of 'ip_address'. + # The actual *implementation* of 'mouse_event'. # The implementation contains the actual runtime logic. # # It may or may not have type hints. If it does, mypy @@ -246,24 +245,21 @@ to more accurately describe the function's behavior: # Mypy will also check and make sure the signature is # consistent with the provided variants. - def ip_address(*components: int) -> Union[IPv4Address, IPv6Address]: - if len(components) == 4: - # Return an IPv4Address object - elif len(components) == 8: - # Return an IPv6Address object + def mouse_event(x1: int, + y1: int, + x2: Optional[int] = None, + y2: Optional[int] = None) -> Union[ClickEvent, DragEvent]: + if x2 is None and y2 is None: + return ClickEvent(x1, y1) + elif x2 is not None and y2 is not None: + return DragEvent(x1, y1, x2, y2) else: - # Raise an exception - -This allows mypy to understand calls to ``ip_address`` much more precisely. -For example, mypy will understand that ``ip_address(127, 0, 0, 1)`` will -always have a return type of ``IPv4Address`` and will report errors for -calls like ``ip_address(1, 2)``. + raise Exception("Bad arguments") -One nuance is that for this particular example, we prefixed each argument -with two underscores to indicate to mypy that they are positional-only. If -we did not do this, mypy would have reported an error: the variants imply -calls like ``ip_address(a=127, b=0, c=0, d=1)`` are legal even though our -implementation is not prepared to accept keyword arguments. +This allows mypy to understand calls to ``mouse_event`` much more precisely. +For example, mypy will understand that ``mouse_event(5, 25)`` will +always have a return type of ``ClickEvent`` and will report errors for +calls like ``mouse_event(5, 25, 2)``. As another example, suppose we want to write a custom container class that implements the ``__getitem__`` method (``[]`` bracket indexing). If this @@ -294,10 +290,6 @@ return type by using overloads like so: else: raise TypeError(...) -Note that making ``index`` positional-only is unnecessary in this case: -using ``index`` as a keyword argument and doing ``my_list.__getitem__(index=4)`` -will work at runtime (though it is bad style). - .. note:: If you just need to constrain a type variable to certain types or @@ -336,7 +328,7 @@ When you call an overloaded function, mypy will infer the correct return type by picking the best matching variant, after taking into consideration both the argument types and arity. However, a call is never type checked against the implementation. This is why mypy will report calls -like ``ip_address(1, 2)`` as being invalid even though it matches the +like ``mouse_event(5, 25, 3)`` as being invalid even though it matches the implementation signature. If there are multiple equally good matching variants, mypy will select @@ -348,21 +340,26 @@ program: from typing import List, overload @overload - def summarize(data: List[str]) -> str: ... + def summarize(data: List[int]) -> int: ... @overload - def summarize(data: List[int]) -> int: ... + def summarize(data: List[str]) -> str: ... def summarize(data): - # ...snip... + if len(data) == 0: + return 0 + elif isinstance(data[0], int): + # Do int-specific code + else: + # Do str-specific code # What is the type of 'output'? str or int? output = summarize([]) The ``summarize([])`` call matches both variants: an empty list could -be either a ``List[str]`` or a ``List[int]``. In this case, mypy +be either a ``List[int]`` or a ``List[str]``. In this case, mypy will break the tie by picking the first matching variant: ``output`` -will have an inferred type of ``str``. The implementor is responsible +will have an inferred type of ``int``. The implementor is responsible for making sure ``summarize`` breaks ties in the same way at runtime. There are however are two exceptions to the "pick the first match" rule. @@ -382,9 +379,9 @@ matching variant returns: .. code-block:: python - some_list: Union[List[str], List[int]] + some_list: Union[List[int], List[str]] - # output3 is of type 'Union[str, int]' + # output3 is of type 'Union[int, str]' output3 = summarize(some_list) .. note:: @@ -392,20 +389,73 @@ matching variant returns: Due to the "pick the first match" rule, changing the order of your overload variants can change how mypy type checks your program. - To minimize potential issues, we recommend ordering your variants - from most to least specific. Your implementation should also - perform runtime checks (e.g. ``isinstance`` checks) in the same - order as the listed variants. - - If your variants have no inherent relationship to each other, they - can naturally be listed in any arbitrary order. + To minimize potential issues, we recommend that you: + + 1. Make sure your overload variants are listed in the same order as + the runtime checks (e.g. ``isinstance`` checks) in your implementation. + 2. Order your variants and runtime checks from most to least specific. + (See the following section for an example). Type checking the variants -------------------------- -Although the "pick the first match" algorithm works well in many cases, it can -sometimes make mypy infer the incorrect type. For example, consider the following -overload definition: +Mypy will perform several checks on your overload variant definitions +to ensure they behave as expected. First, mypy will check and make sure +that no overload variant is shadowing a subsequent one. For example, +consider the following function which adds together two ``Expression`` +objects, and contains a special-case to handle receiving two ``Literal`` +types: + +.. code-block:: python + + from typing import overload, Union + + class Expression: + # ...snip... + + class Literal(Expression): + # ...snip... + + # Warning -- the first overload variant shadows the second! + + @overload + def add(left: Expression, right: Expression) -> Expression: ... + + @overload + def add(left: Literal, right: Literal) -> Literal: ... + + def add(left: Expression, right: Expression) -> Expression: + # ...snip... + +While this code snippet is technically type-safe, it does contain an +anti-pattern: the second variant will never be selected! If we try calling +``add(Literal(3), Literal(4))``, mypy will always pick the first variant +and evaluate the function call to be of type ``Expression``, not ``Literal``. +This is because ``Literal`` is a subtype of ``Expression``, which means +the "pick the first match" rule will always halt after considering the +first overload. + +Because having an overload variant that can never be matched is almost +certainly a mistake, mypy will report an error. To fix the error, we can +either 1) delete the second overload or 2) swap the order of the overloads: + +.. code-block:: python + + # Everything is ok now -- the variants are correctly ordered + # from most to least specific. + + @overload + def add(left: Literal, right: Literal) -> Literal: ... + + @overload + def add(left: Expression, right: Expression) -> Expression: ... + + def add(left: Expression, right: Expression) -> Expression: + # ...snip... + +Mypy will also type check the different variants and flag any overloads +that have inherently unsafely overlapping variants. For example, consider +the following unsafe overload definition: .. code-block:: python @@ -419,9 +469,9 @@ overload definition: def unsafe_func(x: object) -> Union[int, str]: if isinstance(x, int): - return x + return 42 else: - return "Unsafe!" + return "some string" On the surface, this function definition appears to be fine. However, it will result in a discrepency between the inferred type and the actual runtime type @@ -429,14 +479,14 @@ when we try using it like so: .. code-block:: python - def should_return_str(x: object) -> str: - return unsafe_func(x) + some_obj: object = 42 + unsafe_func(some_obj) + " danger danger" # Type checks, yet crashes at runtime! - bad_var = should_return_str(42) - -If we examine just the annotated types, it seems as if ``bad_var`` ought to be of -type ``str``. But if we examine the runtime behavior, ``bad_var`` will actually be -of type ``int``! +This program type checks according to the annotations, but will actually crash +at runtime! Since ``some_obj`` is of type ``object``, mypy will decide that +``unsafe_func`` must return something of type ``str`` and concludes the above will +type check. But in reality, ``unsafe_func`` will return an int, causing the code +to crash at runtime! To prevent these kinds of issues, mypy will detect and prohibit inherently unsafely overlapping overloads on a best-effort basis. Two variants are considered unsafely @@ -447,62 +497,35 @@ overlapping when both of the following are true: subtype of) the second. So in this example, the ``int`` argument in the first variant is a subtype of -the ``object`` argument in the second, yet the ``int`` return type is a subtype of +the ``object`` argument in the second, yet the ``int`` return type not is a subtype of ``str``. Both conditions are true, so mypy will correctly flag ``unsafe_func`` as being unsafe. -However, it is unfortunately impossible to detect *all* unsafe overloads without -crippling overloads to the point where they're completely unusable. For example, -suppose we modify the above example so the function is overloaded against two -completely unrelated types: +However, mypy will not detect *all* unsafe uses of overloads. For example, +suppose we modify the above snippet so it calls ``summarize`` instead of +``unsafe_func``: .. code-block:: python - from typing import overload, Union + some_list: List[str] = [] + summarize(some_list) + "danger danger" # Type safe, yet crashes at runtime! - class A: pass - class B: pass +We run into a similar issue here. This program type checks if we look just at the +annotations on the overloads. But since ``summarize(...)`` is designed to be biased +towards returning an int when it receives an empty list, this program will actually +crash during runtime. - @overload - def func(x: A) -> int: ... +The reason mypy does not flag definitions like ``summarize`` as being potentially +unsafe is because if it did, it would be extremely difficult to write a safe +overload. For example, suppose we define an overload with two variants that accept +types ``A`` and ``B`` respectively. Even if those two types were completely unrelated, +the user could still potentially trigger a runtime error similar to the ones above by +passing in a value of some third type ``C`` that inherits from both ``A`` and ``B``. - @overload - def func(x: B) -> str: ... - - def func(x: Union[A, B]) -> Union[int, str]: - if isinstance(x, A): - return 42 - else: - return "foobar" - - def should_return_str(x: B) -> str: - return func(x) - -This program is fine under most normal conditions. However, if we -create a new class that inherits from *both* A and B, we get the -same kind of discrepency we had earlier: - -.. code-block:: python - - class Combined(A, B): pass - - # bad_var is of type 'str' according to mypy but is in reality of type 'int'. - bad_var = should_return_str(Combined()) - -Mypy could prohibit this by flagging the definition of ``func`` -as being unsafe since ``A`` and ``B`` could theoretically overlap -due to multiple inheritance. However, multiple inheritance is -uncommon and making such a change would make it difficult -to use overloads in a useful way. Consequently, mypy is designed to -ignore the possibility of multiple inheritance altogether when -checking overloads: it will *not* report any errors with the above program. - -Thankfully, these types of situations are relatively rare. What this -does mean, however, is that you should exercise caution when using an -overloaded function in scenarios where it can potentially receive values -that are an instance of two seemingly unrelated types. For example, -``Combined()`` is an instance of both ``A`` and ``B``; the empty list -``[]`` is an instance of both ``List[int]`` and ``List[str]``. +Thankfully, these types of situations are relatively rare. What this does mean, +however, is that you should exercise caution when designing or using an overloaded +function that can potentially receive values that are an instance of two seemingly +unrelated types. Type checking the implementation @@ -529,9 +552,15 @@ with ``Union[int, slice]`` and ``Union[T, Sequence]``. The overload semantics documented above are new as of mypy 0.620. - Older versions of mypy used different semantics. In particular, overload - variants had to different after erasing types and calls to overloaded - functions did not use the "pick the first match" rule. + Previously, mypy used to perform type erasure on all overload variants. For + example, the ``summarize`` example from the previous section used to be + illegal because ``List[str]`` and ``List[int]`` both erased to just ``List[Any]``. + This restriction was removed in mypy 0.620. + + Mypy also previously used to select the best matching variant using a different + algorithm. If this algorithm failed to find a match, it would default to returning + ``Any``. The new algorithm uses the "pick the first match" rule and will fall back + to returning ``Any`` only if the input arguments also contain ``Any``. .. _async-and-await: From 3deeb61cf34fbc9980a4b94eca6a201eecbcf57a Mon Sep 17 00:00:00 2001 From: Michael Lee Date: Tue, 3 Jul 2018 06:52:06 -0700 Subject: [PATCH 4/4] Respond to third code review --- docs/source/more_types.rst | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/docs/source/more_types.rst b/docs/source/more_types.rst index 0aab3c015c87..0649fc81b6c6 100644 --- a/docs/source/more_types.rst +++ b/docs/source/more_types.rst @@ -209,7 +209,7 @@ Our first attempt at writing this function might look like this: elif x2 is not None and y2 is not None: return DragEvent(x1, y1, x2, y2) else: - raise Exception("Bad arguments") + raise TypeError("Bad arguments") While this function signature works, it's too loose: it implies ``mouse_event`` could return either object regardless of the number of arguments @@ -254,7 +254,7 @@ to more accurately describe the function's behavior: elif x2 is not None and y2 is not None: return DragEvent(x1, y1, x2, y2) else: - raise Exception("Bad arguments") + raise TypeError("Bad arguments") This allows mypy to understand calls to ``mouse_event`` much more precisely. For example, mypy will understand that ``mouse_event(5, 25)`` will @@ -319,7 +319,7 @@ function should be omitted: stubs do not contain runtime logic. .. note:: While we can leave the variant body empty using the ``pass`` keyword, - the more common convention is to instead use the ellipse (``...``) literal. + the more common convention is to instead use the ellipsis (``...``) literal. Type checking calls to overloads -------------------------------- @@ -340,26 +340,26 @@ program: from typing import List, overload @overload - def summarize(data: List[int]) -> int: ... + def summarize(data: List[int]) -> float: ... @overload def summarize(data: List[str]) -> str: ... def summarize(data): - if len(data) == 0: - return 0 + if not data: + return 0.0 elif isinstance(data[0], int): - # Do int-specific code + # Do int specific code else: # Do str-specific code - # What is the type of 'output'? str or int? + # What is the type of 'output'? float or str? output = summarize([]) The ``summarize([])`` call matches both variants: an empty list could be either a ``List[int]`` or a ``List[str]``. In this case, mypy will break the tie by picking the first matching variant: ``output`` -will have an inferred type of ``int``. The implementor is responsible +will have an inferred type of ``float``. The implementor is responsible for making sure ``summarize`` breaks ties in the same way at runtime. There are however are two exceptions to the "pick the first match" rule. @@ -381,7 +381,7 @@ matching variant returns: some_list: Union[List[int], List[str]] - # output3 is of type 'Union[int, str]' + # output3 is of type 'Union[float, str]' output3 = summarize(some_list) .. note:: @@ -482,11 +482,10 @@ when we try using it like so: some_obj: object = 42 unsafe_func(some_obj) + " danger danger" # Type checks, yet crashes at runtime! -This program type checks according to the annotations, but will actually crash -at runtime! Since ``some_obj`` is of type ``object``, mypy will decide that -``unsafe_func`` must return something of type ``str`` and concludes the above will -type check. But in reality, ``unsafe_func`` will return an int, causing the code -to crash at runtime! +Since ``some_obj`` is of type ``object``, mypy will decide that ``unsafe_func`` +must return something of type ``str`` and concludes the above will type check. +But in reality, ``unsafe_func`` will return an int, causing the code to crash +at runtime! To prevent these kinds of issues, mypy will detect and prohibit inherently unsafely overlapping overloads on a best-effort basis. Two variants are considered unsafely @@ -512,7 +511,7 @@ suppose we modify the above snippet so it calls ``summarize`` instead of We run into a similar issue here. This program type checks if we look just at the annotations on the overloads. But since ``summarize(...)`` is designed to be biased -towards returning an int when it receives an empty list, this program will actually +towards returning a float when it receives an empty list, this program will actually crash during runtime. The reason mypy does not flag definitions like ``summarize`` as being potentially