-
-
Notifications
You must be signed in to change notification settings - Fork 31
ENH: get more specific about _ArrayLike, make it public #66
Conversation
As something of a data point, MyPy passes on SciPy when checked against this branch. Now the types in SciPy are still very rough, so take that with a grain of salt, but perhaps it means something. |
What about other array-like objects? Are things with both |
numpy-stubs/__init__.pyi
Outdated
@@ -87,7 +87,7 @@ _DtypeLike = Union[ | |||
|
|||
_NdArraySubClass = TypeVar("_NdArraySubClass", bound=ndarray) | |||
|
|||
_ArrayLike = TypeVar("_ArrayLike") | |||
ArrayLike = Union[int, float, complex, generic, ndarray, Sequence] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it be an idea to, respectively, replace int
, float
and complex
with the SupportsInt
, SupportsFloat
and SupportsComplex
protocols?
What about str
, bytes
, bool
, dt.datetime
dt.date
and dt.timedelta
(considering they all have their corresponding generic)?
Furthermore, I'd presonally be in favor of replacing ndarray
with a custom _SupportsArray
protocol along the lines of:
class _SupportsArray(Protocol):
def __array__(self, dtype: _DtypeLike = ...) -> ndarray: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it be an idea to, respectively, replace int, float and complex with the SupportsInt, SupportsFloat and SupportsComplex protocols?
I think the protocols are a little too general; e.g.
>>> class A:
... def __int__(self):
... return 1
...
>>> int(A())
1
>>> np.array(A())
array(<__main__.A object at 0x10e61a290>, dtype=object)
What about str, bytes, bool, dt.datetime dt.date and dt.timedelta (considering they all have their corresponding generic)?
str
and bytes
are both covered by Sequence
. I added bool (thanks for catching that!). The dt.*
I think also ends up giving unexpected results:
>>> np.array(datetime.timedelta(days=1))
array(datetime.timedelta(days=1), dtype=object)
Furthermore, I'd presonally be in favor of replacing ndarray with a custom _SupportsArray protocol along the lines of:
Yes, that is an excellent idea, switched to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the protocols are a little too general; e.g.
Good catch, I actually wasn't aware that, e.g., a SupportsInt
member wouldn't produce an integer array.
str
andbytes
are both covered bySequence
Ah right, that's true.
It seems that the principle we're roughly going for here is "don't allow stuff that will produce object arrays", though we still leave an escape hatch a la |
Hm, it appears to be a little wonky: >>> np.float64(1).__array__()
array(1.)
>>> np.float64(1).__array__(np.complex128)
array(1.+0.j)
>>> np.float64(1).__array__(dtype=np.complex128)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __array__() takes no keyword arguments >>> class A:
... def __array__(self, dtype):
... return np.array([1, 2, 3], dtype=dtype)
...
>>> np.array(A())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __array__() missing 1 required positional argument: 'dtype'
>>> np.array(A(), dtype=np.float64)
array([1., 2., 3.]) >>> class B:
... def __array__(self, dtype=None):
... return np.array([1, 2, 3], dtype=dtype)
...
>>> np.array(B())
array([1, 2, 3])
>>> np.array(B(), dtype=np.float64)
array([1., 2., 3.]) >>> class C:
... def __array__(self):
... return np.array([1, 2, 3])
...
>>> np.array(C())
array([1, 2, 3])
>>> np.array(C(), dtype=np.complex128)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __array__() takes 1 positional argument but 2 were given |
That's unfortunate... it looks like the |
Ok, I made the protocol class _SupportsArray(Protocol):
def __array__(
self, dtype: Optional[_DTypeLike] = ...
) -> Union[Sequence, ndarray]: ... This unfortunately means that |
I'd suggest adding another overload to the # first overload: dtype is optional and positional-only
# Second overload: dtype is optional and can be a positional or keyword argument
class _SupportsArray(Protocol):
@overload
def __array__(self, __dtype: _DtypeLike = ...) -> ndarray: ...
@overload
def __array__(self, dtype: _DtypeLike = ...) -> ndarray: ...
class A():
def __array__(self, dtype: _DtypeLike = None) -> ndarray:
return np.array([1, 2, 3], dtype=dtype)
class B():
def __array__(self, __dtype: _DtypeLike = None) -> ndarray:
return np.array([1, 2, 3], dtype=__dtype)
a = A()
a.__array__()
a.__array__(float)
a.__array__(dtype=float)
b = B()
b.__array__()
b.__array__(float)
b.__array__(dtype=float) # E: Unexpected keyword argument
b.__array__(__dtype=float) # E: Unexpected keyword argument By the way, why is the returned type annotated as |
The signatures are incompatible, so that’s going to violate Liskov I think. (But I’ll give it a try.)
I was going by the docs here: https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html which say ndarray or nested sequence. |
It's not quite clear to me where this incompatibility is located.
I believe what they're talking about in the docs is an object which either: So |
Yup, right on both counts! Updated; hopefully we're good to go now. Note that I had to make |
This looks really nice! My own serious concern here is about adding public type aliases. These do seem quite useful, but what would this imply if/when we move type annotations into NumPy properly? If |
Thankfully it doesn't-this is the same sort of fudging that happens with things like import numpy as np
x: "np.ArrayLike" # Use strings from __future__ import annotations
import numpy as np
x: np.ArrayLike # Now it's treated as a string anyway from typing import TYPE_CHECKING
if TYPE_CHECKING:
from numpy import ArrayLike
else:
ArrayLike = None # Or whatever
x: ArrayLike |
Although maybe I am interpreting your question in the wrong way. I think a backdrop to my answer above is an assumption that even when we move the types into NumPy, they will remain stubs instead of being inlined into the code. I suspect that this is the right course of action because we've seen that the types require a fair bit of "fudging"; i.e. we aren't trying to represent the full NumPy API but instead some typeable subset of it. I think that if the types are inlined then we lose the ability to do that fudging as well. (And lose the ability to do a gazillion overloads; doubt that is going to fly in the NumPy codebase proper.) I'd contrast this to e.g. SciPy where we are inlining the types, mostly because there isn't much odd there (except for needing a bunch of stubs for extension modules). |
FWIW, I've heard that at the language summit they decided |
I appreciate that we could need np.ArrayLike in strings or type annotations, but I suspect that it could still lead to some user confusion to not define them at runtime, too, e.g., to cover use cases like type aliases. The user experience is independent of whether we choose to use stubs or annotations inside NumPy — though I expect we’ll probably end up with some of both. NumPy-stubs is certainly still experimental, so you definitely have my blessing to go ahead for now. But I do think it could be worth sounding out the broader NumPy community on the appetite for adding a selective handful of type protocols into NumPy proper. We might get some useful feedback. For example, should the protocol be called |
If it helps making
to keep it out of the main namespace. In my (limited) experience, it's helpful for things to exist at runtime. |
Keeping it out of the main namespace seems good, though maybe it should be |
Re
I'll send out something to the mailing list. |
No, I avoided that name on purpose, naming something the same as a stdlib module is usually a bad idea (e.g. |
What about something along the lines of |
As a counterargument, Almost all the usage of type annotations I've seen in the wild has erred on the side of keeping the annotations as short as possible, as: from typing import Tuple
from numpy.typing import ArrayLike
def get_arr() -> Tuple[ArrayLike, int]: ... The similarity aids reading here, and the clash is irrelevant. Alternatively, some users might want the full names anyway. The clash is again irrelevant: def get_arr() -> typing.Tuple[np.typing.ArrayLike, int]: ... Finally, if the user cares enough to import just the submodule, they probably want to do something similar with import typing as t
import numpy.typing as npt
def get_arr() -> t.Tuple[npt.ArrayLike, int]: ... |
Ok, discussion on the mailing list: http://numpy-discussion.10968.n7.nabble.com/Feelings-about-type-aliases-in-NumPy-td48059.html seems to be dying down. Takeaways so far:
So, as is often the case, seems like there's rough consensus except on what to name the darned thing. |
numpy-stubs/__init__.pyi
Outdated
@overload | ||
def __array__(self, __dtype: DtypeLike = ...) -> ndarray: ... | ||
@overload | ||
def __array__(self, dtype: Optional[DtypeLike] = ...) -> ndarray: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the Optional
in Optional[DtypeLike]
not redundant here?
Considering None
is already included the union defining DtypeLike
.
numpy-stubs/__init__.pyi
Outdated
@@ -217,6 +221,7 @@ class _ArrayOrScalarCommon( | |||
def shape(self) -> _Shape: ... | |||
@property | |||
def strides(self) -> _Shape: ... | |||
def __array__(self, __dtype: Optional[DtypeLike] = ...) -> ndarray: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
numpy-stubs/__init__.pyi
Outdated
) -> _NdArraySubClass: ... | ||
@overload | ||
def view(self, *, type: Type[_NdArraySubClass]) -> _NdArraySubClass: ... | ||
def getfield(self, dtype: Union[_DtypeLike, str], offset: int = ...) -> ndarray: ... | ||
def getfield(self, dtype: Union[DtypeLike, str], offset: int = ...) -> ndarray: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be simplified?
def getfield(self, dtype: Union[DtypeLike, str], offset: int = ...) -> ndarray: ... | |
def getfield(self, dtype: DtypeLike, offset: int = ...) -> ndarray: ... |
cc @simonjayhawkins if you see any issues with how this ArrayLike will interact with pandas' typing. |
I don't want us to lose momentum here, so: I find @eric-wieser's comment
to match with my own experience as well-typing in Python is fairly verbose, so short forms like
or
seem to be the norm. For that reason I propose that we move ahead with putting things in Are people ok with that, or shall be continue to discuss? |
+1 for numpy.typing, but please send this to the email list, which is the
place of record for design decisions
…On Sat, May 9, 2020 at 10:17 AM Josh Wilson ***@***.***> wrote:
I don't want us to lose momentum here, so: I find @eric-wieser
<https://github.com/eric-wieser>'s comment
Almost all the usage of type annotations I've seen in the wild has erred
on the side of keeping the annotations as short as possible
to match with my own experience as well-typing in Python is fairly
verbose, so short forms like
import typing as t
or
from typing import ...
seem to be the norm. For that reason I propose that we move ahead with
putting things in numpy.typing. For now it will be in the stubs only, and
when we merge the stubs into NumPy itself we can make it available at
runtime.
Are people ok with that, or shall be continue to discuss?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#66 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJJFVU6ZQ67KMCLMA4UE6LRQWF27ANCNFSM4MM7A2OQ>
.
|
Sounds fine to me, thanks for keeping this moving. |
f39c6ef
to
7adbc04
Compare
Closes numpy#37. Add tests to check various examples. Note that supporting __array__ also requires making _DtypeLike public too, so this does that as well.
Ok, mailing list has been notified, PR has been rebased, review comments have been addressed (I think), the types have been moved into |
Any objections to moving forward? Since this conflicts with just about every other PR it would be nice to get it in to avoid more rebasing. |
No complaints here from my side; feel free to continue. |
Look good to me! |
In it goes then. Thanks for reviewing everyone! |
Closes #37.
Add tests to check various examples.