Skip to content

BUG: series construction from dict of Timedelta scalar doesn't work #38032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
arw2019 opened this issue Nov 24, 2020 · 5 comments · Fixed by #38405
Closed

BUG: series construction from dict of Timedelta scalar doesn't work #38032

arw2019 opened this issue Nov 24, 2020 · 5 comments · Fixed by #38405
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Timedelta Timedelta data type
Milestone

Comments

@arw2019
Copy link
Member

arw2019 commented Nov 24, 2020

Construction of Series from a dictionary of scalars sometimes fails:

Code Sample, a copy-pastable example

In [31]: import pandas as pd
    ...: import pandas._testing as tm
    ...: 
    ...: td = pd.Timedelta(nanoseconds=500)
    ...: ser = pd.Series({"a": td})
    ...: expected = pd.Series(td, index=["a"], dtype="timedelta64[ns]")
    ...: 
    ...: tm.assert_series_equal(ser, expected)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-31-a9c6a6312101> in <module>
      6 expected = pd.Series(td, index=["a"], dtype="timedelta64[ns]")
      7 
----> 8 tm.assert_series_equal(ser, expected)

    [... skipping hidden 1 frame]

~/repos/pandas/pandas/_testing.py in assert_extension_array_equal(left, right, check_dtype, index_values, check_less_precise, check_exact, rtol, atol)
   1243         # Avoid slow object-dtype comparisons
   1244         # np.asarray for case where we have a np.MaskedArray
-> 1245         assert_numpy_array_equal(
   1246             np.asarray(left.asi8), np.asarray(right.asi8), index_values=index_values
   1247         )

    [... skipping hidden 1 frame]

~/repos/pandas/pandas/_testing.py in _raise(left, right, err_msg)
   1155             diff = diff * 100.0 / left.size
   1156             msg = f"{obj} values are different ({np.round(diff, 5)} %)"
-> 1157             raise_assert_detail(obj, msg, left, right, index_values=index_values)
   1158 
   1159         raise AssertionError(err_msg)

~/repos/pandas/pandas/_testing.py in raise_assert_detail(obj, message, left, right, diff, index_values)
   1085         msg += f"\n[diff]: {diff}"
   1086 
-> 1087     raise AssertionError(msg)
   1088 
   1089 

AssertionError: numpy array are different

numpy array values are different (100.0 %)
[index]: [a]
[left]:  [500]
[right]: [0]

Ran on 1.2 master

@arw2019 arw2019 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 24, 2020
@arw2019 arw2019 added Timedelta Timedelta data type Constructors Series/DataFrame/Index/pd.array Constructors labels Nov 24, 2020
@jorisvandenbossche
Copy link
Member

It seems it is actually the "expected" construction that doesn't work, and not the construction from a dict:

In [34]: td
Out[34]: Timedelta('0 days 00:00:00.000000500')

In [35]: pd.Series({"a": td})
Out[35]: 
a   0 days 00:00:00.000000500
dtype: timedelta64[ns]

In [36]: pd.Series(td, index=["a"], dtype="timedelta64[ns]")
Out[36]: 
a   0 days
dtype: timedelta64[ns]

So this last method seems to lose the nanoseconds ..

@jreback jreback added this to the Contributions Welcome milestone Nov 24, 2020
@jreback jreback removed the Needs Triage Issue that has not been reviewed by a pandas team member label Nov 24, 2020
@ma3da
Copy link
Contributor

ma3da commented Nov 24, 2020

pd.Timedelta insertion into an ndarray behaves strangely for small values.

>>> a = np.empty(1, dtype="timedelta64[ns]")
>>> a.fill(pd.Timedelta(999))
>>> a
array([0], dtype='timedelta64[ns]')
>>> a.fill(pd.Timedelta(1000))
>>> a
array([1000], dtype='timedelta64[ns]')

For this issue, this happens here.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 24, 2020

Ah, that's because numpy only knows datetime.timedelta, and thus handles a pandas.Timedelta that way, and datetme.timedelta don't have nanoseconds ..

So we will need to handle that a bit differently on the pandas side for datetime/timedelta dtypes (eg filling the array with the integer representation fo the datetime/timedelta, and casting to datetime64/timedelta64 afterwards).

@jorisvandenbossche
Copy link
Member

@ma3da want to do a PR for this?

@ma3da
Copy link
Contributor

ma3da commented Nov 24, 2020

Ah ok, so there's no hope to fix the np cast "directly". Ok, i'll look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Timedelta Timedelta data type
Projects
None yet
4 participants