Skip to content

Fix the output of df.describe on an empty categorical / object column #26474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 1, 2019

Conversation

enisnazif
Copy link
Contributor

@enisnazif enisnazif commented May 20, 2019

@pep8speaks
Copy link

pep8speaks commented May 20, 2019

Hello @enisnazif! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-05-31 14:05:51 UTC

…n empty Categorical / Object column is the same as that of an non empty column
@enisnazif enisnazif changed the title Fix the output of df.describe on an empty categorical / object column #26397 Fix the output of df.describe on an empty categorical / object column May 20, 2019
@codecov
Copy link

codecov bot commented May 20, 2019

Codecov Report

Merging #26474 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26474      +/-   ##
==========================================
- Coverage   91.75%   91.74%   -0.01%     
==========================================
  Files         174      174              
  Lines       50765    50767       +2     
==========================================
- Hits        46578    46576       -2     
- Misses       4187     4191       +4
Flag Coverage Δ
#multiple 90.25% <100%> (ø) ⬆️
#single 41.72% <0%> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 93.49% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97.02% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5cc078...10df157. Read the comment docs.

@codecov
Copy link

codecov bot commented May 20, 2019

Codecov Report

Merging #26474 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26474      +/-   ##
==========================================
- Coverage   91.84%   91.84%   -0.01%     
==========================================
  Files         174      174              
  Lines       50644    50646       +2     
==========================================
- Hits        46516    46514       -2     
- Misses       4128     4132       +4
Flag Coverage Δ
#multiple 90.37% <100%> (ø) ⬆️
#single 41.71% <0%> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/categorical.py 95.91% <100%> (ø) ⬆️
pandas/core/generic.py 93.62% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f31865...7100855. Read the comment docs.

@TomAugspurger
Copy link
Contributor

The test failure is from our job against NumPy master.

        df = pd.DataFrame({"empty_col": Categorical([])})
>       result = df.describe()

pandas/tests/frame/test_analytics.py:594: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/core/generic.py:9956: in describe
    ldesc = [describe_1d(s) for _, s in data.iteritems()]
pandas/core/generic.py:9956: in <listcomp>
    ldesc = [describe_1d(s) for _, s in data.iteritems()]
pandas/core/generic.py:9939: in describe_1d
    return describe_categorical_1d(data)
pandas/core/generic.py:9900: in describe_categorical_1d
    objcounts = data.value_counts()
pandas/core/base.py:1318: in value_counts
    normalize=normalize, bins=bins, dropna=dropna)
pandas/core/algorithms.py:689: in value_counts
    result = Series(values)._values.value_counts(dropna=dropna)
pandas/core/arrays/categorical.py:1480: in value_counts
    count = bincount(obs, minlength=ncat or None)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (array([], dtype=int8),), kwargs = {'minlength': None}
relevant_args = (array([], dtype=int8), None)

>   ???
E   DeprecationWarning: 0 should be passed as minlength instead of None; this will error in future.

I would see if Categorical.value_counts can be updated to pass count = bincount(obs, minlength=ncat or 0). If that doesn't work for older numpy's then you'll need a bit more compat code.

@enisnazif
Copy link
Contributor Author

The test failure is from our job against NumPy master.

        df = pd.DataFrame({"empty_col": Categorical([])})
>       result = df.describe()

pandas/tests/frame/test_analytics.py:594: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/core/generic.py:9956: in describe
    ldesc = [describe_1d(s) for _, s in data.iteritems()]
pandas/core/generic.py:9956: in <listcomp>
    ldesc = [describe_1d(s) for _, s in data.iteritems()]
pandas/core/generic.py:9939: in describe_1d
    return describe_categorical_1d(data)
pandas/core/generic.py:9900: in describe_categorical_1d
    objcounts = data.value_counts()
pandas/core/base.py:1318: in value_counts
    normalize=normalize, bins=bins, dropna=dropna)
pandas/core/algorithms.py:689: in value_counts
    result = Series(values)._values.value_counts(dropna=dropna)
pandas/core/arrays/categorical.py:1480: in value_counts
    count = bincount(obs, minlength=ncat or None)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (array([], dtype=int8),), kwargs = {'minlength': None}
relevant_args = (array([], dtype=int8), None)

>   ???
E   DeprecationWarning: 0 should be passed as minlength instead of None; this will error in future.

I would see if Categorical.value_counts can be updated to pass count = bincount(obs, minlength=ncat or 0). If that doesn't work for older numpy's then you'll need a bit more compat code.

Ok, thanks - I'll take a look

…n empty Categorical / Object column is the same as that of an non empty column
@TomAugspurger TomAugspurger added the Numeric Operations Arithmetic, Comparison, and Logical operations label May 21, 2019
@TomAugspurger TomAugspurger added this to the 0.25.0 milestone May 21, 2019
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a release note for 0.25.0.

What section? Are we calling this an API breaking change, or a bugfix? cc @WillAyd @jreback

…n empty Categorical / Object column is the same as that of an non empty column
@jreback
Copy link
Contributor

jreback commented May 26, 2019

@enisnazif can you add a note in the api breaking section

@enisnazif
Copy link
Contributor Author

@enisnazif can you add a note in the api breaking section

Done

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc comment; ping on green.

Addressed review comments
@enisnazif
Copy link
Contributor Author

doc comment; ping on green.
@jreback checks have all passed, ok to merge?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a small doc fix. LGTM

@enisnazif
Copy link
Contributor Author

i'll take a look at the failing jobs

@TomAugspurger
Copy link
Contributor

@enisnazif that's fixed on master. Just merged & repushed. Ping on green.

@enisnazif
Copy link
Contributor Author

enisnazif commented May 31, 2019

@enisnazif that's fixed on master. Just merged & repushed. Ping on green.

@jreback @TomAugspurger hey, looks like it's green. Is it good to merge?

@jreback jreback merged commit 2c6d005 into pandas-dev:master Jun 1, 2019
@jreback
Copy link
Contributor

jreback commented Jun 1, 2019

thanks @enisnazif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.describe excludes top and freq for empty DataFrame
4 participants