series.apply(pandas.to_datetime, convert_dtype=False) still converts dtype #14559

radekholy24 · 2016-11-02T08:46:30Z

A small, complete example of the issue

>>> import pandas
>>> s = pandas.Series({'a': '2012-05-01 00:00:00'})
>>> s.apply(pandas.to_datetime, convert_dtype=False)
a   2012-05-01
dtype: datetime64[ns]

Expected Output

a   2012-05-01
dtype: object

Output of `pd.show_versions()`

pandas: 0.19.0

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.0
nose: None
pip: 7.1.0
setuptools: 18.0.1
Cython: None
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2016-11-02T09:11:47Z

Not sure what you are trying to do, as to_datetime acts on a full series as well, so the idiomatic thing to do is pd.to_datetime(s).

The docstring of apply says about convert_dtype:

convert_dtype : boolean, default True

Try to find better dtype for elementwise function results. If
False, leave as dtype=object

So this keyword only applies when the function works elementwise. As mentioned above, pd.to_datetime can act on the full series at once.
If you take an example function that will only work element-wise, you can see the effect of this convert_dtype keyword

In [2]: s = pd.Series(['a', 'b'])

In [3]: s
Out[3]: 
0    a
1    b
dtype: object

In [4]: def f(val):
   ...:     if val == 'a':
   ...:         return 1
   ...:     else:
   ...:         return 2

In [6]: s.apply(f)
Out[6]: 
0    1
1    2
dtype: int64

In [7]: s.apply(f, convert_dtype=False)
Out[7]: 
0    1
1    2
dtype: object

But again, your code does not feel idiomatic, so please clarify what you are trying to achieve. In many cases you don't want to keep this object dtype. Having the series a datetime64 dtype gives you access to specific functionality.

radekholy24 · 2016-11-02T10:00:49Z

@jorisvandenbossche, I was under impression that pandas.to_datetime is applied elementwise since:

>>> isinstance(pandas.to_datetime, numpy.ufunc)
False

Anyway, s.apply(lambda x: pandas.to_datetime(x), convert_dtype=False) behaves the same and that is applied elementwise for sure.

In my case, my function receives different functions to apply on the series and thus it does not know beforehand whether it will get pandas.to_datetime or anything else. A simplified version of my code looks like:

class Foo:
    def __init__(self, generator):
        self.dataframe = generator.generate()

    def convert(self, name, converter):
        self.dataframe[name] = self.dataframe[name].apply(converter, convert_dtype=False)

In my case, it's much easier to describe the behavior of the convert method as that it preserves dtype=object rather than explaining that it applies some smart logic to change the dtype. Also it's much easier to unit test the method since a DataFrame with the same values but different dtypes do not equal and it's easier to create an "object dtyped" DataFrame than a DataFrame with each column having different dtype. In my case, code simplicity is preferred over performance optimizations.

Also, regardless of my use case, the behavior of the apply method does not match the documentation [1] and thus it's a bug either in the code or in the documentation.

[1] if not in the case of apply(pandas.to_datetime) then in the case of apply(lambda x: pandas.to_datetime(x)) (or some more complex function that may return pandas.Timestamp) for sure

jorisvandenbossche · 2016-11-02T10:35:59Z

I understand that you don't want to distinguish between elementwise functions or not in your application, and for that the use of apply is appropriate.
But if you only want object dtype, then don't convert your data. I really don't recommend trying to keep everything as object dtype. Once you start doing manipulations with those data, data types will get deduced and you get dtypes anyway.

it's much easier to unit test the method since a DataFrame with the same values but different dtypes do not equal

you can specify not to check the dtype

it's easier to create an "object dtyped" DataFrame than a DataFrame with each column having different dtype

that is not true, as when creating a dataframe the default is to deduce the dtypes from the data you pass in

If you want to keep object dtype, you can simply do .astype(object) after the apply call (or astype(self.dataframe[name].dtype) if it is not always object dtype)

For the specifics, the reason this does not work as documented for datetimes, is this:

In [43]: pd.Series(np.array([1, 2], dtype=object))
Out[43]: 
0    1
1    2
dtype: object

In [45]: pd.Series(np.array([pd.Timestamp('2012-01-01'), pd.Timestamp('2012-01-02')], dtype=object))
Out[45]: 
0   2012-01-01
1   2012-01-02
dtype: datetime64[ns]

Under the hood, if convert_dtypes=False, on object array is returned, but when putting this in a series the object dtype is kept for numerical values, but not for datetimes.

radekholy24 · 2016-11-02T11:00:32Z

you can specify not to check the dtype

You mean using .astype(object) on both DataFrames before? Good idea, I'll consider that. Thank you.

that is not true, as when creating a dataframe the default is to deduce the dtypes from the data you pass in

In which case, I'm hitting the #14558 issue. I'll retest this idea with Pandas 0.20. Thank you.

@jorisvandenbossche, OK, I think I can use one of the approaches you have suggested. Anyway, may I ask you to reopen this in order to track the issue between the behavior and the documentation?

jorisvandenbossche closed this as completed Nov 2, 2016

jorisvandenbossche added the Usage Question label Nov 2, 2016

jorisvandenbossche added this to the No action milestone Nov 2, 2016

MarcelBeining mentioned this issue Jun 3, 2019

pandas Series apply returns dtype:datetime though convert_dtype is set to False #26630

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

series.apply(pandas.to_datetime, convert_dtype=False) still converts dtype #14559

series.apply(pandas.to_datetime, convert_dtype=False) still converts dtype #14559

radekholy24 commented Nov 2, 2016

jorisvandenbossche commented Nov 2, 2016

radekholy24 commented Nov 2, 2016 •

edited

Loading

jorisvandenbossche commented Nov 2, 2016

radekholy24 commented Nov 2, 2016

series.apply(pandas.to_datetime, convert_dtype=False) still converts dtype #14559

series.apply(pandas.to_datetime, convert_dtype=False) still converts dtype #14559

Comments

radekholy24 commented Nov 2, 2016

A small, complete example of the issue

Expected Output

Output of pd.show_versions()

jorisvandenbossche commented Nov 2, 2016

radekholy24 commented Nov 2, 2016 • edited Loading

jorisvandenbossche commented Nov 2, 2016

radekholy24 commented Nov 2, 2016

Output of `pd.show_versions()`

radekholy24 commented Nov 2, 2016 •

edited

Loading