Skip to content
This repository was archived by the owner on Apr 10, 2024. It is now read-only.

Aggregation identity on entirely missing data #23

Open
chris-b1 opened this issue Sep 14, 2016 · 1 comment
Open

Aggregation identity on entirely missing data #23

chris-b1 opened this issue Sep 14, 2016 · 1 comment

Comments

@chris-b1
Copy link

potentially related to #9

numpy ufuncs have an identity, which pandas follows with respect to misssing data.

np.sum([], dtype='float64')
Out[33]: 0.0

np.nansum([np.nan], dtype='float64')
Out[35]: 0.0

pd.Series([np.nan]).sum()
Out[36]: 0.0

I don't feel that strongly one way or the other but there's definitely a case to be made that [36] should be NA. The number of bug reports indicate that at minimum, people get tripped up by this, xref pandas-dev/pandas#9422

So could consider modifying the identity concept for pandas 2.0, since there will be less binding to numpy semantics.

@wesm
Copy link
Owner

wesm commented Sep 15, 2016

I'm +1 on [36] being NA. [33] is probably correct. At minimum it would be useful to document this behavior carefully.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants