Skip to content

BUG: MultiIndex loses category after .stack() #36991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
maroth96 opened this issue Oct 8, 2020 · 3 comments · Fixed by #40127
Closed
3 tasks done

BUG: MultiIndex loses category after .stack() #36991

maroth96 opened this issue Oct 8, 2020 · 3 comments · Fixed by #40127
Labels
Bug Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@maroth96
Copy link
Contributor

maroth96 commented Oct 8, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd


columns = pd.MultiIndex.from_tuples(
    [('x', 1, 3), ('x', 2, 3), ('y', 1, 3), ('y', 2, 3)], names=['a', 'b', 'c'])
columns = columns.set_levels(
    [columns.levels[i].astype('category') for i in range(0, 2)], level=[0, 1])


df = pd.DataFrame([[1, 2, 3, 4]], columns=columns)
dtypes = df.columns.to_frame().dtypes

assert isinstance(dtypes.a, pd.CategoricalDtype)
assert isinstance(dtypes.b, pd.CategoricalDtype)


df2 = df.stack(['a', 'b'])
dtypes2 = df2.index.to_frame().dtypes

assert isinstance(dtypes2.a, pd.CategoricalDtype)
assert isinstance(dtypes2.b, pd.CategoricalDtype)  # broken

Problem description

Categorical types within a MultiIndex should be preserved after calling .stack().

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit : db08276
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.3
numpy : 1.18.4
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.3.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@maroth96 maroth96 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 8, 2020
@rhshadrach
Copy link
Member

Thanks for the report! Confirmed on master, further investigations and PRs are welcome.

@rhshadrach rhshadrach added Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 11, 2021
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Feb 11, 2021
@maroth96
Copy link
Contributor Author

The problem is in _stack_multi_columns (core/reshape/reshape.py)

The following line causes the dtypes to be lost:

new_columns = MultiIndex.from_tuples(unique_groups, names=new_names)

The most straightforward solution is simply to add:

new_columns = new_columns.set_levels([
    new_columns.levels[i].astype(this.columns.levels[i].dtype)
    for i in range(0, new_columns.size)
])

This causes my assertions to succeed. The additional conversions are not so elegant, however.

@rhshadrach Shall I create such a change, or you see a better solution?

@rhshadrach
Copy link
Member

@maroth96 - It's not clear to me if there is a better way to preserve dtype; I recommend opening a PR with this and seeing if others have suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants