-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
groupby() drops categorical columns when aggregating with isna() #29837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
you need to try with a much newer version and/or master |
Thanks for the suggestion @jreback. However, I observed the same behavior with Output of
|
How much do you need this data type at the time of using groupby()? Have you considered this solution? df = pd.DataFrame({'A' : [ 1 , 1 , 1 , 1],
'B' : [ 1 , 2 , 1 , 2],
'numerical_col' : [ .1, .2, np.nan, .3],
'object_col' : ['foo', 'bar', 'foo', ' fee'],
'categorical_col': ['foo','bar','foo','fee']
})
df_double = df
df = df.astype({'categorical_col': 'category'})
df_double.groupby([df['A'], df['B']]).agg(lambda df: df.isna().sum())
df_double = None |
@AskariyanKarine Thanks for the suggestion. It is not that I need a workaround, but that a consistent and expected behavior is needed. |
Hi I am really interested in contributing to pandas would love to work on this issue . Is this issue already resolved ? |
Looks like this works on master. Could use a test
|
Sure will add a test case for this in groupby section |
Code Sample, a copy-pastable example if possible
Problem description
The categorical column "categorical_col" is expected to survive the aggregation, however, it gets dropped.
Expected Output
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-693.11.6.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 4.3.1
pip: 19.3.1
setuptools: 40.6.3
Cython: None
numpy: 1.15.4
scipy: 1.1.0
pyarrow: 0.11.1
xarray: None
IPython: 7.1.1
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml: 4.3.0
bs4: None
html5lib: None
sqlalchemy: 1.2.13
pymysql: 0.9.3
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: