Skip to content

BUG: Error with std of nullable column obtained from groupby #35516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
MathieuDutSik opened this issue Aug 2, 2020 · 4 comments · Fixed by #50375
Closed
1 task

BUG: Error with std of nullable column obtained from groupby #35516

MathieuDutSik opened this issue Aug 2, 2020 · 4 comments · Fixed by #50375
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays

Comments

@MathieuDutSik
Copy link

MathieuDutSik commented Aug 2, 2020

The error occurs with

df = pd.DataFrame({"A": [2, 1, 1, 1, 2, 2, 1], "B": pd.Series(np.full(7, np.nan), dtype="Int64")})
df.groupby("A").std()

while now we have df.groupby("A").var() working correctly.

  • [ x] I have checked that this issue has not already been reported.

  • [x ] I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

@MathieuDutSik MathieuDutSik added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 2, 2020
@MathieuDutSik MathieuDutSik changed the title BUG: BUG: Error with std of nullable column obtained from groupby Aug 2, 2020
@simonjayhawkins simonjayhawkins added ExtensionArray Extending pandas with custom dtypes or arrays. Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 2, 2020
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Aug 2, 2020
@arw2019
Copy link
Member

arw2019 commented Nov 2, 2020

This works on current master

@arw2019 arw2019 added the Needs Tests Unit test(s) needed to prevent regressions label Nov 2, 2020
@jreback
Copy link
Contributor

jreback commented Jan 1, 2021

this is more complicated as its currently wrong, we should be outputing and ignoring in the calculation.

In [21]: df = pd.DataFrame(
    ...:         {"A": [2, 1, 1, 1, 2, 2, 1], "B": pd.Series(np.full(7, np.nan), dtype="Int64"), 'C': pd.array([1,2,3, 4,pd.NA, pd.NA, pd.NA], dtype='Int64')}
    ...:
    ...:         )

In [22]: df.groupby('A').std()
Out[22]:
    B    C
A
1 NaN  1.0
2 NaN  NaN

In [23]: df.groupby('A').sum()
Out[23]:
   B  C
A
1  0  9
2  0  1

In [24]: df.groupby('A').mean()
Out[24]:
      B    C
A
1  <NA>  3.0
2  <NA>  1.0

@jreback jreback removed the Needs Tests Unit test(s) needed to prevent regressions label Jan 1, 2021
@GabrielSimonetto
Copy link

@jreback What would be the desired behaviour? Is the problem only in the std function? Sorry if the question is a bit dumb, but I don't understand what your example is showing

@jreback
Copy link
Contributor

jreback commented Jan 3, 2021

we need to actually handle pd.NA correctly and return it as appropriate

here the results should all contain of pd.NA

this is pretty new and that's why it has not been built out yet

groupby supports EA for mean sum min max but not yet std / var for EAs generally

and that's the issue

@mroeschke mroeschke added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Aug 8, 2021
@jbrockmendel jbrockmendel added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Dec 21, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
7 participants