Aggregation identity on entirely missing data #23

chris-b1 · 2016-09-14T20:03:01Z

potentially related to #9

numpy ufuncs have an identity, which pandas follows with respect to misssing data.

np.sum([], dtype='float64')
Out[33]: 0.0

np.nansum([np.nan], dtype='float64')
Out[35]: 0.0

pd.Series([np.nan]).sum()
Out[36]: 0.0

I don't feel that strongly one way or the other but there's definitely a case to be made that [36] should be NA. The number of bug reports indicate that at minimum, people get tripped up by this, xref pandas-dev/pandas#9422

So could consider modifying the identity concept for pandas 2.0, since there will be less binding to numpy semantics.

The text was updated successfully, but these errors were encountered:

wesm · 2016-09-15T04:09:40Z

I'm +1 on [36] being NA. [33] is probably correct. At minimum it would be useful to document this behavior carefully.

jreback added the missing data label Sep 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aggregation identity on entirely missing data #23

Aggregation identity on entirely missing data #23

chris-b1 commented Sep 14, 2016

wesm commented Sep 15, 2016

Uh oh!

Aggregation identity on entirely missing data #23

Aggregation identity on entirely missing data #23

Comments

chris-b1 commented Sep 14, 2016

wesm commented Sep 15, 2016

Uh oh!