Skip to content
This repository was archived by the owner on Jun 10, 2020. It is now read-only.

Balancing correctness vs. type sanity #12

Closed
shoyer opened this issue Mar 5, 2018 · 7 comments
Closed

Balancing correctness vs. type sanity #12

shoyer opened this issue Mar 5, 2018 · 7 comments

Comments

@shoyer
Copy link
Member

shoyer commented Mar 5, 2018

#11 surfaced an important issue to decide: should our type annotations prioritize correctness or type sanity/friendliness.

I wrote:

Yes, this sort of behavior is very unfortunate for us, and unfortunately is also quite prevalent in NumPy. My inclination is that our type-stubs should prioritize correctness more than catching all possible errors. Zero-dimensional arrays are somewhat usual to see in NumPy, but they do come up. Without typing support for array dimensionality, this would mean that many of these return values should be unsatisfying Any types.

@alanhdu responded:

I'm willing to defer to your judgement here, but I'd personally lean the other way -- IME w/ MyPy is still limited enough that you almost never have "seamless" integration with any reasonably large codebase, and given that you have to adapt the coding style (to write "type-friendly" code) and do integration work anyways, I think it's reasonable to prioritize "correctness" (as long as the burden's not too much, like having to add some int and float casts for numpy scalars).

Here's an illustrative example. NumPy ufuncs like np.sin() usually return a numpy.ndarray when passed a numpy.ndarray as input. However, there are two notable exceptions:

  • This isn't true for 0-dimensional arrays: ufuncs (and most other numpy functions) return numpy scalar types instead.
  • It also isn't true if any arguments are ndarray subclasses: these can overload __array_ufunc__ (or other old override mechanisms) to return whatever they like.

In practice, both 0-dimensional arrays and ndarray subclasses (especially those that would violate this design) are rare compared to the dominant usecase, but these cases definitely do come up in large code-bases.

Neither of these exceptions are currently expressible with our typing system, so we can choose between:

  1. def sin(x: np.ndarray) -> Any (correct, but useless for type checking)
  2. def sin(x: np.ndarray) -> Union[np.generic, np.ndarray] (not worrying about ndarray subclasses -- you shouldn't write a subclass that violates NumPy's typing rules)
  3. def sin(x: np.ndarray) -> np.ndarray (not worrying about subclasses or 0-dimensional arrays)

(Note that this would be one of several overrides for ufuncs, which also should be defined for non-ndarrays.)

@njsmith
Copy link
Member

njsmith commented Mar 5, 2018

What about def sin(x: duckarray) -> duckarray?

@alanhdu
Copy link
Contributor

alanhdu commented Mar 5, 2018

IMO Option 2 is kind of the worst of both worlds -- if a function returns a Union, you almost always have to "downcast" it into one of the options to avoid type hinting errors later (e.g. with assert isinstance(..., ndarray) checks. For example, the following toy example fails with error: Item "int" of "Union[int, ndarray]" has no attribute "all"

class ndarray:
    def all(self) -> bool:
        return True
def sin(x: ndarray) -> Union[int, ndarray]:
    return x
sin(ndarray()).all()

Option 1 immediately loses all typechecking without type assertions everywhere -- even simple np.sin(x).non_existent_function() will happily type-check (and since I assume most inputs to NumPy functions are outputs from other numpy functions, this essentially means that even the input type annotations aren't that useful).

The downside to option 3 is undesirable that you have to change

x = np.array(3)
np.sin(x)

to

from typing import cast
np.sin(float(x))
# or
np.sin(cast(float, x))
# or
cast(np.float32, np.sin(x))

but I think this happens rarely enough that typechecking large codebases would still be worth that effort.

@alanhdu
Copy link
Contributor

alanhdu commented Mar 5, 2018

@njsmith I'm not sure that'd work in this case. np.sin(np.array(3)) returns a np.float64 scalar, not a 0-dimensional array which are different types, so something like:

(np.sin(x) > 5).all()

should (ideally) type-check with if x = np.array([1, 2, 3]) but fail if x = np.array(3)

This actually brings up another quirk though -- things like:

np.sin([1])
np.sin([[1]]
np.sin([[1, 2], (3, 4)])
np.sin([[[[[[1]]]]]])

are all legal, but I'm not sure how to express that duckarray type in MyPy. Naively, I'd hope for something like:

duckarray = Union[
    ndarray, 
    Sequence[T], 
    Sequence[duckarray],   # for nested sequences
]

but MyPy doesn't support recursive types and I'm not sure there's a good way to fully express the types otherwise.

@njsmith
Copy link
Member

njsmith commented Mar 5, 2018

np.float64 goes to great lengths to quack like an array, though.

@shoyer
Copy link
Member Author

shoyer commented Mar 5, 2018

Indeed, np.generic does define almost every attribute you can find on np.ndarray:

In [3]: np.generic.<tab>
             all()          argsort()      choose()       conjugate()    data           dumps()        flatten()      itemset()
             any()          astype()       clip()         copy()         diagonal()     fill()         getfield()     itemsize
             argmax()       base           compress()     cumprod()      dtype          flags          imag           max()          >
             argmin()       byteswap()     conj()         cumsum()       dump()         flat           item()         mean()

That said, I'm sure the necessity to type cast would still be annoying.

@alanhdu
Copy link
Contributor

alanhdu commented Mar 5, 2018

Oof -- that's embarrassing 🙊. For some reason I thought that np.float was an alias for np.float64 instead of for a Python float which is why I thought they acted completely differently.

If np.generic looks mostly ducktypes as np.ndarray, then I think that duckarray = Union[np.generic, np.ndarray] is the way to go -- MyPy will go ahead and typecheck each case (np.generic and np.ndarray) and if they both typechecks then the union typechecks.

@person142
Copy link
Member

person142 commented Jun 9, 2020

Closing this-we've been making correctness versus sanity judgement calls on the go as the types evolve, probably the important thing now is to make sure we document the decisions. (For now they tend to be documented with comments in tests cases; that needs to be improved.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants