Skip to content

BUG: pandas.to_datetime raises exception when more than 50 values needs coercion to NaT #43732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
larsr opened this issue Sep 24, 2021 · 7 comments
Closed
3 tasks done
Assignees
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@larsr
Copy link

larsr commented Sep 24, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
import datetime, pytz
    
# t is a date after 2262 that should be coerced into NaT
t = datetime.datetime(3000, 1, 1, 0, 0, 0, 0, pytz.UTC)

# lists with many such dates
l50 = [t] * 50
l51 = [t] * 51

# pd.to_datetime crashes if the list is longer that 50

# this is fine
print(pd.to_datetime(l50, utc=True, errors='coerce'))  

# this crashes
print(pd.to_datetime(l51, utc=True, errors='coerce'))

Issue Description

Background (irrelevant for reproducing the bug, but good to know):
A google bigquery gives me dataframe with a column with datetime values with year=9999,
and I store the result as a parquet file. When I read it back, read_parquet crashes.
I traced it down to the following problem in pd.to_datetime.

The issue:
if pd.datetime needs to convert more than 50 datetime values with tzinfo set, and those values are above year 2262 (the maximal date for pd.Timestamp) and should be coerced into NaT, then pd.to_datetime crashes with a raised exception. It works fine if the series is smaller than 51 items.

Expected Behavior

The expected behavior is that pd.to_datetime should return a list of NaT values also for Sequences longer than 50 items.

Installed Versions


commit : 73c6825
python : 3.7.10.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.0-17-cloud-amd64
Version : #1 SMP Debian 4.19.194-3 (2021-07-18)
machine : x86_64
processor :
byteorder : little
LC_ALL : en_US.UTF-8
LANG : C.UTF-8
LOCALE : None.None

pandas : 1.3.3
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.27.0
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fsspec : 2021.08.1
fastparquet : None
gcsfs : 2021.08.1
matplotlib : 3.4.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 5.0.0
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : 1.4.25
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.54.0

@larsr larsr added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 24, 2021
@jreback
Copy link
Contributor

jreback commented Sep 24, 2021

hmm try this on master
not sure if we back ported this patch

@debnathshoham
Copy link
Member

this is still failing on master

@larsr
Copy link
Author

larsr commented Sep 24, 2021

The exception is raised on line 467 in array_to_datetime in tslib.pyx

It raises the exception because (four lines above) the value of utc_convert is False.

The value is False because the call to objects_to_datetime64ns on line 2066 does not fill in a value for the utc parameter (which has default value False).

So when objects_to_datetime later calls array_to_datetime it uses utc=False.

@thomasqueirozb
Copy link

take

thomasqueirozb added a commit to thomasqueirozb/pandas that referenced this issue Sep 28, 2021
…ds coercion to NaT (pandas-dev#43732)

Raise proper error from objects_to_datetime64ns

Co-authored-by: André Elimelek de Weber (andrekwr) <[email protected]>
Co-authored-by: Henry Rocha (HenryRocha) <[email protected]>
thomasqueirozb added a commit to thomasqueirozb/pandas that referenced this issue Sep 28, 2021
…ds coercion to NaT (pandas-dev#43732)

Raise proper error from objects_to_datetime64ns

Co-authored-by: André Elimelek de Weber (andrekwr) <[email protected]>
Co-authored-by: Henry Rocha (HenryRocha) <[email protected]>
@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 1, 2021
thomasqueirozb added a commit to thomasqueirozb/pandas that referenced this issue Oct 19, 2021
…ds coercion to NaT (pandas-dev#43732)

Raise proper error from objects_to_datetime64ns
Check for OutOfBoundsDatetime in _maybe_cache

Co-authored-by: André Elimelek de Weber (andrekwr) <[email protected]>
Co-authored-by: Henry Rocha (HenryRocha) <[email protected]>
thomasqueirozb added a commit to thomasqueirozb/pandas that referenced this issue Oct 19, 2021
…ds coercion to NaT (pandas-dev#43732)

Raise proper error from objects_to_datetime64ns
Check for OutOfBoundsDatetime in _maybe_cache

Co-authored-by: André Elimelek de Weber (andrekwr) <[email protected]>
Co-authored-by: Henry Rocha (HenryRocha) <[email protected]>
@jreback jreback added this to the 1.4 milestone Nov 12, 2021
thomasqueirozb added a commit to thomasqueirozb/pandas that referenced this issue Nov 13, 2021
…ds coercion to NaT (pandas-dev#43732)

Raise proper error from objects_to_datetime64ns
Check for OutOfBoundsDatetime in _maybe_cache

Co-authored-by: André Elimelek de Weber (andrekwr) <[email protected]>
Co-authored-by: Henry Rocha (HenryRocha) <[email protected]>
@jreback jreback modified the milestones: 1.4, Contributions Welcome Dec 23, 2021
@srotondo
Copy link
Contributor

I believe that this is a duplicated issue that was fixed during the closing of #45319, so this issue can be closed, right?

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@debnathshoham
Copy link
Member

this works fine on master now

@debnathshoham debnathshoham added the Closing Candidate May be closeable, needs more eyeballs label Oct 30, 2022
@phofl
Copy link
Member

phofl commented Dec 18, 2022

agreed, this was fixed and tested in the other issue

@phofl phofl closed this as completed Dec 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
7 participants