Skip to content

crosstab sometimes erroneously handles dt operations #28452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Eric-Sommer opened this issue Sep 16, 2019 · 2 comments
Closed

crosstab sometimes erroneously handles dt operations #28452

Eric-Sommer opened this issue Sep 16, 2019 · 2 comments

Comments

@Eric-Sommer
Copy link

Eric-Sommer commented Sep 16, 2019

Code Sample

import pandas as pd
df = pd.DataFrame({'date': ['2019-01-01', '2019-02-01', '2018-01-01', '2018-02-01']})
df["date"] = pd.to_datetime(df["date"])
pd.crosstab(df["date"].dt.month, df["date"].dt.year)

Problem description

This gives my a 2x2 table with the years 2018 and 2019 on both axes, thus wrongly treating the dt.month operation. The same thing happens if I replace dt.month with dt.day in my example. Interestingly, pd.crosstab(df["date"].dt.day, df["date"].dt.month) works as expected, while pd.crosstab(df["date"].dt.month, df["date"].dt.day) does not!

Expected Output

year 2018 2019
month
1 1 1
2 1 1

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : None
pytest : 5.0.1
hypothesis : 4.32.2
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8

@TomAugspurger
Copy link
Contributor

The following is correct

In [6]: pd.crosstab(df["date"].dt.month.rename('month'), df["date"].dt.year.rename('year'))
   ...:
   ...:
Out[6]:
year   2018  2019
month
1         1     1
2         1     1

Can you investigate what crosstab is supposed to do when the arguments have the same name? I agree that the behavior you show is surprising.

@TomAugspurger
Copy link
Contributor

I think this is a duplicate of #22529

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants