Skip to content

BUG: .max(axis=1) is incorrect when DataFrame contains tz-aware timestamps and NaT #44196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
tscheburaschka opened this issue Oct 26, 2021 · 2 comments
Closed
2 of 3 tasks
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@tscheburaschka
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd

# Incorrect result and warning with timezone aware df
rng_with_tz = pd.date_range(start='2021-10-01T12:00:00+02:00', end='2021-10-02T12:00:00+02:00', freq='4H')
df_with_tz = pd.DataFrame(data={'A': rng_with_tz, 'B': rng_with_tz + pd.Timedelta(minutes=20)})
df_with_tz.iloc[2, 1] = pd.NaT
print(df_with_tz)
print(df_with_tz.max(axis=1))

# No problem with timezone naive dataframe
rng_tz_naive = pd.date_range(start='2021-10-01T12:00:00', end='2021-10-02T12:00:00', freq='4H')
df_tz_naive = pd.DataFrame(data={'A': rng_tz_naive, 'B': rng_tz_naive + pd.Timedelta(minutes=20)})
df_tz_naive.iloc[2, 1] = pd.NaT
print(df_tz_naive)
print(df_tz_naive.max(axis=1))

# Also no problem on other axis
print(df_with_tz.transpose().max(axis=0))

# And no problem without NaT
df_with_tz.iloc[2, 1] = df_with_tz.iloc[2, 0] + pd.Timedelta(minutes=20)
print(df_with_tz.max(axis=1))

Issue Description

Using the .max(axis=1) operation on a dataframe with timezone aware datetime columns that contain at least one NaT
produces an incorrect result (and an obscure warning).
The issue occurs only when all conditions are met. For example there is a workaround by operating on
the other axis via transpose like df.transpose().max(axis=0). Also dataframes with timezone naive columns do not show
the problematic behaviour as do dataframes that do not contain NaTs.

Expected Behavior

The operation df_with_tz.max(axis=1) should return the max along columns thereby ignoring NaTs, similar to the same operation along rows (axis=0).

Installed Versions

INSTALLED VERSIONS

commit : 945c9ed
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.14393
machine : AMD64
processor : Intel64 Family 6 Model 85 Stepping 0, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : de_DE.cp1252

pandas : 1.3.4
numpy : 1.20.3
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 58.3.0
Cython : None
pytest : 6.2.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.54.1

@tscheburaschka tscheburaschka added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 26, 2021
@jbrockmendel
Copy link
Member

xref #27794

@tscheburaschka tscheburaschka changed the title BUG: .max(axis=1) is incorrect when DataFrame contains tz-aware timestamps dnNaT BUG: .max(axis=1) is incorrect when DataFrame contains tz-aware timestamps and NaT Oct 27, 2021
@mroeschke
Copy link
Member

This looks like the same issue as #27794 so closing in favor of that issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
3 participants