Skip to content

Type issue in empty groupby from DataFrame with categorical #9614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xflr6 opened this issue Mar 8, 2015 · 3 comments · Fixed by #29355
Closed

Type issue in empty groupby from DataFrame with categorical #9614

xflr6 opened this issue Mar 8, 2015 · 3 comments · Fixed by #29355
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@xflr6
Copy link
Contributor

xflr6 commented Mar 8, 2015

In a DataFrame without a categorical, the following comparisons work as expected:

df = pd.DataFrame({'id': [None] * 3, 'spam': [None] * 3})
df['spam'] == 'spam'
df.groupby('id').first()['spam'] == 'spam'

However, when a column is Categorical, a groupby on the all-null column behaves unexpected:

df['spam'] = df['spam'].astype('category')
df['spam'] == 'spam'  # works as expected
df.groupby('id').first()['spam'] == 'spam'  # raises TypeError: invalid type comparison

Looks like the groupby converts all types in the group to float64:

>>> df.groupby('id').first().info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 0 entries
Data columns (total 1 columns):
spam    0 non-null float64
dtypes: float64(1)
memory usage: 0.0 bytes
@jreback
Copy link
Contributor

jreback commented Mar 8, 2015

will mark this as a bug; I suspect that the resulting series is simply not of categorical type (which would make the comparison valid).

pull-requests welcome!

@jreback jreback added Bug Groupby Indexing Related to indexing on series/frames, not to indexes themselves labels Mar 8, 2015
@jreback jreback added this to the 0.16.1 milestone Mar 8, 2015
@jreback jreback modified the milestones: 0.17.0, 0.16.1 Apr 29, 2015
@jreback jreback modified the milestones: Next Major Release, 0.17.0 Aug 15, 2015
@jorisvandenbossche
Copy link
Member

Still failing with 0.23.2

@datapythonista datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018
@mroeschke
Copy link
Member

Looks to work on master. Could use a test.

In [11]: df.groupby('id').first()['spam'] == 'spam'
Out[11]: Series([], Name: spam, dtype: bool)

In [12]: pd.__version__
Out[12]: '0.26.0.dev0+627.gef77b5700'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Groupby Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 22, 2019
@jreback jreback modified the milestones: Someday, 1.0 Nov 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants