Skip to content
This repository was archived by the owner on Jun 10, 2020. It is now read-only.

ENH: get more specific about _ArrayLike, make it public #66

Merged
merged 1 commit into from
May 17, 2020

Conversation

person142
Copy link
Member

Closes #37.

Add tests to check various examples.

@person142
Copy link
Member Author

As something of a data point, MyPy passes on SciPy when checked against this branch. Now the types in SciPy are still very rough, so take that with a grain of salt, but perhaps it means something.

@rgommers
Copy link
Member

What about other array-like objects? Are things with both __array__ and __len__ passing because of Sequence, or is that not tested? Or is that what the Protocol discussion was about?

@@ -87,7 +87,7 @@ _DtypeLike = Union[

_NdArraySubClass = TypeVar("_NdArraySubClass", bound=ndarray)

_ArrayLike = TypeVar("_ArrayLike")
ArrayLike = Union[int, float, complex, generic, ndarray, Sequence]
Copy link
Member

@BvB93 BvB93 Apr 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be an idea to, respectively, replace int, float and complex with the SupportsInt, SupportsFloat and SupportsComplex protocols?

What about str, bytes, bool, dt.datetime dt.date and dt.timedelta (considering they all have their corresponding generic)?

Furthermore, I'd presonally be in favor of replacing ndarray with a custom _SupportsArray protocol along the lines of:

class _SupportsArray(Protocol):
    def __array__(self, dtype: _DtypeLike = ...) -> ndarray: ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be an idea to, respectively, replace int, float and complex with the SupportsInt, SupportsFloat and SupportsComplex protocols?

I think the protocols are a little too general; e.g.

>>> class A:
...     def __int__(self):
...         return 1
...
>>> int(A())
1
>>> np.array(A())
array(<__main__.A object at 0x10e61a290>, dtype=object)

What about str, bytes, bool, dt.datetime dt.date and dt.timedelta (considering they all have their corresponding generic)?

str and bytes are both covered by Sequence. I added bool (thanks for catching that!). The dt.* I think also ends up giving unexpected results:

>>> np.array(datetime.timedelta(days=1))
array(datetime.timedelta(days=1), dtype=object)

Furthermore, I'd presonally be in favor of replacing ndarray with a custom _SupportsArray protocol along the lines of:

Yes, that is an excellent idea, switched to that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the protocols are a little too general; e.g.

Good catch, I actually wasn't aware that, e.g., a SupportsInt member wouldn't produce an integer array.

str and bytes are both covered by Sequence

Ah right, that's true.

@person142
Copy link
Member Author

@rgommers @BvB93 thanks for the feedback-I added a _SupportsArray protocol in 4f654e5; it is quite a nice unification.

@person142
Copy link
Member Author

It seems that the principle we're roughly going for here is "don't allow stuff that will produce object arrays", though we still leave an escape hatch a la

4f654e5#diff-98e9e1660b68614cffb8585ea52a0bdcR31.

@BvB93
Copy link
Member

BvB93 commented Apr 21, 2020

@rgommers @BvB93 thanks for the feedback-I added a _SupportsArray protocol in 4f654e5; it is quite a nice unification.

Btw, doesn't __array__() also have the dtype argument?

@person142
Copy link
Member Author

person142 commented Apr 22, 2020

Btw, doesn't array() also have the dtype argument?

Hm, it appears to be a little wonky:

>>> np.float64(1).__array__()
array(1.)
>>> np.float64(1).__array__(np.complex128)
array(1.+0.j)
>>> np.float64(1).__array__(dtype=np.complex128)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __array__() takes no keyword arguments
>>> class A:
...     def __array__(self, dtype):
...         return np.array([1, 2, 3], dtype=dtype)
...
>>> np.array(A())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __array__() missing 1 required positional argument: 'dtype'
>>> np.array(A(), dtype=np.float64)
array([1., 2., 3.])
>>> class B:
...     def __array__(self, dtype=None):
...         return np.array([1, 2, 3], dtype=dtype)
...
>>> np.array(B())
array([1, 2, 3])
>>> np.array(B(), dtype=np.float64)
array([1., 2., 3.])
>>> class C:
...     def __array__(self):
...         return np.array([1, 2, 3])
...
>>> np.array(C())
array([1, 2, 3])
>>> np.array(C(), dtype=np.complex128)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __array__() takes 1 positional argument but 2 were given

@person142
Copy link
Member Author

That's unfortunate... it looks like the __array__ methods on scalars are inherently different from the user ones since they force positional arguments, whereas only positional arguments in the user ones don't work.

@person142
Copy link
Member Author

Ok, I made the protocol

class _SupportsArray(Protocol):
    def __array__(
        self, dtype: Optional[_DTypeLike] = ...
    ) -> Union[Sequence, ndarray]: ...

This unfortunately means that ndarray and generic no longer satisfy it! To work around that I added them back to the big union. This is kind of gross, but I think it's the best we can do given the above examples. WDYT?

@BvB93
Copy link
Member

BvB93 commented Apr 22, 2020

This unfortunately means that ndarray and generic no longer satisfy it! To work around that I added them back to the big union. This is kind of gross, but I think it's the best we can do given the above examples. WDYT?

I'd suggest adding another overload to the _SupportsArray protocol instead, such that it works with in situations where dtype is positional-only or keyword-or-positional:

# first overload: dtype is optional and positional-only
# Second overload: dtype is optional and can be a positional or keyword argument
class _SupportsArray(Protocol):
    @overload
    def __array__(self, __dtype: _DtypeLike = ...) -> ndarray: ...
    @overload
    def __array__(self, dtype: _DtypeLike = ...) -> ndarray: ...


class A():
    def __array__(self, dtype: _DtypeLike = None) -> ndarray:
        return np.array([1, 2, 3], dtype=dtype)

class B():
    def __array__(self, __dtype: _DtypeLike = None) -> ndarray:
        return np.array([1, 2, 3], dtype=__dtype)

a = A()
a.__array__()
a.__array__(float)
a.__array__(dtype=float)

b = B()
b.__array__()
b.__array__(float)
b.__array__(dtype=float)  # E: Unexpected keyword argument
b.__array__(__dtype=float)  # E: Unexpected keyword argument

By the way, why is the returned type annotated as Union[Sequence, ndarray] in your example? Doesn't it always just return ndarray?

@person142
Copy link
Member Author

person142 commented Apr 22, 2020

I'd suggest adding another overload to the _SupportsArray protocol instead, such that it works with in situations where dtype is positional-only or keyword-or-positional

The signatures are incompatible, so that’s going to violate Liskov I think. (But I’ll give it a try.)

By the way, why is the returned type annotated as Union[Sequence, ndarray] in your example?

I was going by the docs here:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html

which say ndarray or nested sequence.

@BvB93
Copy link
Member

BvB93 commented Apr 22, 2020

The signatures are incompatible, so that’s going to violate Liskov I think. (But I’ll give it a try.)

It's not quite clear to me where this incompatibility is located.
__array__() (ref) takes self and (optionally) dtype as argument. The newly overloaded _SupportsArray can now deal with cases where dtype is either keyword-only or positional-or-keyword.

I was going by the docs here:

I believe what they're talking about in the docs is an object which either:
a. Has an __array__() method that returns an ndarray.
b. Is a (nested) sequence.

So Union[_SupportsArray, Sequence[Any]] in other words, rather than __array__() itself (potentially) returning a Sequence.

@person142
Copy link
Member Author

person142 commented Apr 24, 2020

It's not quite clear to me where this incompatible is located.

I believe what they're talking about in the docs is an object which either:

Yup, right on both counts! Updated; hopefully we're good to go now.

Note that I had to make _DtypeLike public too so that someone could use it in the annotations for their __array__ method, but that's ok because it was desired anyway (#13).

@shoyer
Copy link
Member

shoyer commented Apr 24, 2020

This looks really nice!

My own serious concern here is about adding public type aliases. These do seem quite useful, but what would this imply if/when we move type annotations into NumPy properly? If np.ArrayLike needs to be valid, does that imply that if we move annotations into NumPy we need to expose ArrayLike in the public API of NumPy?

@person142
Copy link
Member Author

person142 commented Apr 24, 2020

If np.ArrayLike needs to be valid, does that imply that if we move annotations into NumPy we need to expose ArrayLike in the public API of NumPy?

Thankfully it doesn't-this is the same sort of fudging that happens with things like Queue[T] in the stdlib (where the real class isn't actually generic). But it does mean that people using annotations need to take one of these options:

import numpy as np

x: "np.ArrayLike"  # Use strings
from __future__ import annotations

import numpy as np

x: np.ArrayLike  # Now it's treated as a string anyway
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from numpy import ArrayLike
else:
    ArrayLike = None  # Or whatever

x: ArrayLike

@person142
Copy link
Member Author

person142 commented Apr 24, 2020

Although maybe I am interpreting your question in the wrong way. I think a backdrop to my answer above is an assumption that even when we move the types into NumPy, they will remain stubs instead of being inlined into the code.

I suspect that this is the right course of action because we've seen that the types require a fair bit of "fudging"; i.e. we aren't trying to represent the full NumPy API but instead some typeable subset of it. I think that if the types are inlined then we lose the ability to do that fudging as well. (And lose the ability to do a gazillion overloads; doubt that is going to fly in the NumPy codebase proper.)

I'd contrast this to e.g. SciPy where we are inlining the types, mostly because there isn't much odd there (except for needing a bunch of stubs for extension modules).

@emmatyping
Copy link
Contributor

FWIW, I've heard that at the language summit they decided __future__.annotations will become default in 3.10 I believe.

@shoyer
Copy link
Member

shoyer commented Apr 24, 2020

I appreciate that we could need np.ArrayLike in strings or type annotations, but I suspect that it could still lead to some user confusion to not define them at runtime, too, e.g., to cover use cases like type aliases. The user experience is independent of whether we choose to use stubs or annotations inside NumPy — though I expect we’ll probably end up with some of both.

NumPy-stubs is certainly still experimental, so you definitely have my blessing to go ahead for now. But I do think it could be worth sounding out the broader NumPy community on the appetite for adding a selective handful of type protocols into NumPy proper. We might get some useful feedback. For example, should the protocol be called ArrayLike or array_like as currently appears in many NumPy docstrings?

@rgommers
Copy link
Member

If it helps making ArrayLike et al. public, I don't see a big issue with that; would probably prefer to write it as

from numpy.types import ArrayLike

x: ArrayLike

to keep it out of the main namespace. In my (limited) experience, it's helpful for things to exist at runtime.

@person142
Copy link
Member Author

from numpy.types import ArrayLike

Keeping it out of the main namespace seems good, though maybe it should be numpy.typing instead of numpy.types to match the stdlib module?

@person142
Copy link
Member Author

Re

But I do think it could be worth sounding out the broader NumPy community on the appetite for adding a selective handful of type protocols into NumPy proper.

I'll send out something to the mailing list.

@rgommers
Copy link
Member

from numpy.types import ArrayLike

Keeping it out of the main namespace seems good, though maybe it should be numpy.typing instead of numpy.types to match the stdlib module?

No, I avoided that name on purpose, naming something the same as a stdlib module is usually a bad idea (e.g. scipy.io was a pretty bad choice, that's why it's usually import as spio).

@BvB93
Copy link
Member

BvB93 commented Apr 24, 2020

What about something along the lines of numpy.annotations?

@eric-wieser
Copy link
Member

eric-wieser commented Apr 27, 2020

Keeping it out of the main namespace seems good, though maybe it should be numpy.typing instead of numpy.types to match the stdlib module?

No, I avoided that name on purpose, naming something the same as a stdlib module is usually a bad idea

As a counterargument, numpy.distutils clashes with distutils, but we used it anyway, presumably because the similarity was worth emphasizing.

Almost all the usage of type annotations I've seen in the wild has erred on the side of keeping the annotations as short as possible, as:

from typing import Tuple
from numpy.typing import ArrayLike

def get_arr() -> Tuple[ArrayLike, int]: ...

The similarity aids reading here, and the clash is irrelevant.

Alternatively, some users might want the full names anyway. The clash is again irrelevant:

def get_arr() -> typing.Tuple[np.typing.ArrayLike, int]: ...

Finally, if the user cares enough to import just the submodule, they probably want to do something similar with typing anyway (note: I've not actually seen anyone do this).

import typing as t
import numpy.typing as npt
def get_arr() -> t.Tuple[npt.ArrayLike, int]: ...

@person142
Copy link
Member Author

Ok, discussion on the mailing list:

http://numpy-discussion.10968.n7.nabble.com/Feelings-about-type-aliases-in-NumPy-td48059.html

seems to be dying down. Takeaways so far:

  • No (open) objections to adding types to NumPy itself in the future
  • Most people preferred not doing it in the top-level namespace; though @shoyer was a notable objector
  • Little agreement on what to name the module

So, as is often the case, seems like there's rough consensus except on what to name the darned thing.

@overload
def __array__(self, __dtype: DtypeLike = ...) -> ndarray: ...
@overload
def __array__(self, dtype: Optional[DtypeLike] = ...) -> ndarray: ...
Copy link
Member

@BvB93 BvB93 May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the Optional in Optional[DtypeLike] not redundant here?
Considering None is already included the union defining DtypeLike.

@@ -217,6 +221,7 @@ class _ArrayOrScalarCommon(
def shape(self) -> _Shape: ...
@property
def strides(self) -> _Shape: ...
def __array__(self, __dtype: Optional[DtypeLike] = ...) -> ndarray: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

) -> _NdArraySubClass: ...
@overload
def view(self, *, type: Type[_NdArraySubClass]) -> _NdArraySubClass: ...
def getfield(self, dtype: Union[_DtypeLike, str], offset: int = ...) -> ndarray: ...
def getfield(self, dtype: Union[DtypeLike, str], offset: int = ...) -> ndarray: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified?

Suggested change
def getfield(self, dtype: Union[DtypeLike, str], offset: int = ...) -> ndarray: ...
def getfield(self, dtype: DtypeLike, offset: int = ...) -> ndarray: ...

@TomAugspurger
Copy link

cc @simonjayhawkins if you see any issues with how this ArrayLike will interact with pandas' typing.

@person142
Copy link
Member Author

I don't want us to lose momentum here, so: I find @eric-wieser's comment

Almost all the usage of type annotations I've seen in the wild has erred on the side of keeping the annotations as short as possible

to match with my own experience as well-typing in Python is fairly verbose, so short forms like

import typing as t

or

from typing import ...

seem to be the norm. For that reason I propose that we move ahead with putting things in numpy.typing. For now it will be in the stubs only, and when we merge the stubs into NumPy itself we can make it available at runtime.

Are people ok with that, or shall be continue to discuss?

@shoyer
Copy link
Member

shoyer commented May 9, 2020 via email

@rgommers
Copy link
Member

rgommers commented May 9, 2020

Sounds fine to me, thanks for keeping this moving.

Closes numpy#37.

Add tests to check various examples. Note that supporting __array__
also requires making _DtypeLike public too, so this does that as well.
@person142
Copy link
Member Author

Ok, mailing list has been notified, PR has been rebased, review comments have been addressed (I think), the types have been moved into numpy.typing, and the tests have been updated accordingly.

@person142
Copy link
Member Author

Any objections to moving forward? Since this conflicts with just about every other PR it would be nice to get it in to avoid more rebasing.

@BvB93
Copy link
Member

BvB93 commented May 17, 2020

No complaints here from my side; feel free to continue.

@shoyer
Copy link
Member

shoyer commented May 17, 2020

Look good to me!

@person142 person142 merged commit fa6b9fd into numpy:master May 17, 2020
@person142 person142 deleted the array-like branch May 17, 2020 22:50
@person142
Copy link
Member Author

In it goes then. Thanks for reviewing everyone!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

What to do with array_like variables
7 participants