Skip to content

np.nan comparison to pd.NA Equality Semantics differ elementwise in IntervalArray #31882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WillAyd opened this issue Feb 11, 2020 · 1 comment · Fixed by #44830
Closed

np.nan comparison to pd.NA Equality Semantics differ elementwise in IntervalArray #31882

WillAyd opened this issue Feb 11, 2020 · 1 comment · Fixed by #44830
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays
Milestone

Comments

@WillAyd
Copy link
Member

WillAyd commented Feb 11, 2020

Discovered in #31799

I think this is probably a deeper rooted problem than just IntervalArray but not sure how else to describe. @jorisvandenbossche maybe has thoughts

>>> from pandas.core.arrays import IntervalArray
>>> arr = IntervalArray.from_arrays([0., 1., 2., np.nan], [1., 2., 3., np.nan])
>>> arr == pd.NA
array([False, False, False, False])
>>> arr[-1] == pd.NA
<NA>
@WillAyd WillAyd added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Feb 11, 2020
@jorisvandenbossche
Copy link
Member

Yes, also some other array types such as Categorical or DatetimeArray have this behaviour.

In the pd.NA object's ops, we handle np.ndarray, so we get behaviour like this:

In [26]: np.array([1, 2]) == pd.NA 
Out[26]: array([<NA>, <NA>], dtype=object)

See

elif isinstance(other, np.ndarray):
out = np.empty(other.shape, dtype=object)
out[:] = NA

but this doesn't handle our own array types.

I think we need to decide where this should be handled (should pd.NA keep returning NotImplemented to have it handled correctly by the array object, or should pd.NA be able to handle those array types?) In general, handling this on the array might be better, since the output dtype might differ depending on the exact operation.


In general, we basically didn't put much effort in making pd.NA already working with array types that don't use this as the missing value indicator (eg DatetimeArray - pd.NA also raises TypeError). Also because it is not always clear what the result should be. Eg we might decide that DatetimeArray - pd.NA should result in TimedeltaArray, but TimedeltaArray cannot yet hold pd.NA, so returning that is not yet possible.

@jorisvandenbossche jorisvandenbossche added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Feb 20, 2020
@mroeschke mroeschke added the Bug label Apr 28, 2020
@jreback jreback added this to the 1.4 milestone Dec 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants