Skip to content

pandas Series apply returns dtype:datetime though convert_dtype is set to False #26630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MarcelBeining opened this issue Jun 3, 2019 · 5 comments
Labels
Apply Apply, Aggregate, Transform, Map Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions

Comments

@MarcelBeining
Copy link

pandas version 0.24.2

Code Sample, a copy-pastable example if possible

import pandas as pd
import datetime

thisdf = pd.DataFrame({'date':['12.01.1999','','28.04.2012']})

def parseDate(date):
    try:
        return datetime.datetime.strptime(date,'%d.%m.%Y')
    except Exception:
        return None
    
print(thisdf.date.apply(parseDate, convert_dtype=False))

Output:

0   1999-01-12
1          NaT
2   2012-04-28
Name: date, dtype: datetime64[ns]

Problem description

I want to parse a pandas Series containing strings to datetime using the above function parseDate. The important thing is that None values must not be converted to NaT because I put it into a SQLAlchemy database afterwards which only recognizes None values as NULL. Hence the returned Series must be of dtype object. However, all methods I tried return the Series with dtype: datetime64[ns] after parsing.

The solution of #14559 (using .astype(object) after the apply) does not help because the NaT values are still NaT values instead of None. Of course I could do another round of transformation after that but looping through twice does not seem very performant to me.

Any help appreciated .

Expected Output

A pandas Series with dtype=object

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@TomAugspurger
Copy link
Contributor

I'm not really familiar with what convert_dtype does, but it's arguable that if the user provides convert_dtype=False, then we should pass through dtype=mapped.dtype in

return self._constructor(mapped,
, to disable further inference.

Are you interested in trying that out and making a PR?

@TomAugspurger TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions Datetime Datetime data dtype labels Jun 3, 2019
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 3, 2019

Just to clarify, things are fine up until that point. We have an ndarray[object]

(Pdb) mapped
array([datetime.datetime(1999, 1, 12, 0, 0), None,
       datetime.datetime(2012, 4, 28, 0, 0)], dtype=object)

But we re-infer datetime64[ns] dtype when making the new series

(Pdb) pp self._constructor(mapped)
0   1999-01-12
1          NaT
2   2012-04-28
dtype: datetime64[ns]
(Pdb) pp self._constructor(mapped, dtype=mapped.dtype)
0    1999-01-12 00:00:00
1                   None
2    2012-04-28 00:00:00
dtype: object

Alternatively, just don't use .apply

In [8]: pd.Series([parseDate(x) for x in thisdf.date], index=thisdf.index, dtype=object)
Out[8]:
0    1999-01-12 00:00:00
1                   None
2    2012-04-28 00:00:00
dtype: object

@jreback
Copy link
Contributor

jreback commented Jun 3, 2019

-1 on this

we have many ways to not infer; by definition .apply does infer things

@MarcelBeining
Copy link
Author

So the only solution to this is using a for-loop followed by transforming back into a Series object as suggested by @TomAugspurger ?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 4, 2019

by definition .apply does infer things

Do you know what convert_object is useful for? I've never used it, but from glancing at the code it's passed to lib.infer_dtype(output), so roughly "allow object dtype when inferring the output type". Then it seems like we ignore that when we pass the array to the Series constructor, since we infer datetime dtype.

So the only solution to this is using a for-loop followed by transforming back into a Series object as suggested by @TomAugspurger ?

That's what I would recommend. I don't use .apply that often, for reasons like this. It can be a bit too helpful :)

@jbrockmendel jbrockmendel added the Apply Apply, Aggregate, Transform, Map label Oct 17, 2019
@mroeschke mroeschke added the Bug label Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

5 participants