Skip to content

Converting None values with pandas.to_datetime is unpredictable #23055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dinya opened this issue Oct 9, 2018 · 8 comments
Open

Converting None values with pandas.to_datetime is unpredictable #23055

dinya opened this issue Oct 9, 2018 · 8 comments
Labels
API - Consistency Internal Consistency of API/Behavior Bug Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@dinya
Copy link

dinya commented Oct 9, 2018

Why does pandas convert None values in the different ways for to_datetime (unpredictable) and to_numeric (predictable)?

import pandas as pd
  
VALUE = None
    
print(pd.to_datetime(VALUE))
print(pd.to_numeric(VALUE))

print(pd.__version__)

returns

None
nan
0.23.4

due to https://github.com/pandas-dev/pandas/blob/v0.23.4/pandas/core/tools/datetimes.py#L382

Why not pd.to_datetime(None) is pd.NaT by design?

See also original post at stackoverflow.

@WillAyd WillAyd added Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Oct 9, 2018
@WillAyd
Copy link
Member

WillAyd commented Oct 9, 2018

Personally not sure if there's a history to this but maybe @jbrockmendel knows

@jbrockmendel
Copy link
Member

I'm not aware of any reason for this. Its also inconsistent with a couple of other construction methods:

>>> pd.Timestamp(None)
NaT
>>> pd.to_datetime([None])
DatetimeIndex(['NaT'], dtype='datetime64[ns]', freq=None)

@dinya a PR to fix this would be welcome.

@mroeschke
Copy link
Member

to_timedelta could use this change as well:

In [1]: pd.to_timedelta(None)

In [2]: pd.Timedelta(None)
Out[2]: NaT

@WillAyd WillAyd added this to the Contributions Welcome milestone Oct 10, 2018
@pambot
Copy link
Contributor

pambot commented Oct 22, 2018

@WillAyd I was giving this a try, and I think this may be why people haven't ever changed it: changing the returned None to NaT raises a whole bunch of exceptions that boil down to:

    six.exec_(co, mod.__dict__)
pandas/tests/series/indexing/conftest.py:3: in <module>
    from pandas.tests.series.common import TestData
pandas/tests/series/common.py:5: in <module>
    _ts = tm.makeTimeSeries()
pandas/util/testing.py:1925: in makeTimeSeries
    return Series(randn(nper), index=makeDateIndex(nper, freq=freq), name=name)
pandas/core/series.py:245: in __init__
    .format(val=len(data), ind=len(index)))
E   ValueError: Length of passed values is 30, index implies 0

I think this None/NaT thing runs deeper than it initially looks like - here, the error is caused by makeDateIndex(nper, freq=freq), where nper is 30, returning DatetimeIndex([], dtype='datetime64[ns]', freq='B'). I'll keep digging through it, but I thought it was worth noting down.

@WillAyd
Copy link
Member

WillAyd commented Oct 22, 2018

@pambot I saw some questions come through via email but don't think they made it to the GitHub UI (believe they were having some issues when you posted).

I know one of your questions was around how a DTI should behave when an end argument is not provided, though I didn't really understand what distinction you were trying to make. If you could repost would be helpful for discussion

@pambot
Copy link
Contributor

pambot commented Oct 23, 2018

@WillAyd Oh yeah, sorry for the confusion - I think I posted that just before I discovered the source of the bug (that triggered before any tests even ran), namely that a line had to be changed to if end is None or end is NaT to deal with the NaT. This made the tests configure themselves properly, but predictably, a whole bunch of other tests have broken (37 in total). I wonder if there's a case to be made for some kind of values config file where each types' allowed values info is kept, including the default null values.

@jreback
Copy link
Contributor

jreback commented Oct 23, 2018

you can can use isna on a scalar value to check

@ghost
Copy link

ghost commented May 14, 2020

Sorry for messy PR. Please review!

@jreback jreback modified the milestones: Contributions Welcome, 1.1 May 17, 2020
@jreback jreback modified the milestones: 1.1, Contributions Welcome Jul 10, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants