Skip to content

Feature: NaN in float64 series should become NaT with astype() #2984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelaye opened this issue Mar 7, 2013 · 4 comments
Closed

Feature: NaN in float64 series should become NaT with astype() #2984

michaelaye opened this issue Mar 7, 2013 · 4 comments

Comments

@michaelaye
Copy link
Contributor

To get an auto-diff on a timeseries I do:

In [47]: ts
Out[47]:
0    2011-04-16 00:00:00.025000
1    2011-04-16 00:00:00.152999
2    2011-04-16 00:00:00.280999
3    2011-04-16 00:00:00.409000
4    2011-04-16 00:00:00.537000
5    2011-04-16 00:00:00.665000
6    2011-04-16 00:00:00.792999
7    2011-04-16 00:00:00.921000
8    2011-04-16 00:00:01.049000
9    2011-04-16 00:00:01.177000
10   2011-04-16 00:00:01.297000
11   2011-04-16 00:00:01.424999
12   2011-04-16 00:00:01.552999
13   2011-04-16 00:00:01.680999
14   2011-04-16 00:00:01.809000
...
25646   2011-04-16 00:59:58.143001
25647   2011-04-16 00:59:58.270999
25648   2011-04-16 00:59:58.398998
25649   2011-04-16 00:59:58.527000
25650   2011-04-16 00:59:58.654998
25651   2011-04-16 00:59:58.783000
25652   2011-04-16 00:59:58.910999
25653   2011-04-16 00:59:59.039001
25654   2011-04-16 00:59:59.166999
25655   2011-04-16 00:59:59.294998
25656   2011-04-16 00:59:59.423000
25657   2011-04-16 00:59:59.550998
25658   2011-04-16 00:59:59.679000
25659   2011-04-16 00:59:59.806999
25660   2011-04-16 00:59:59.935001
Length: 25661, dtype: datetime64[ns]

In [48]: ts.diff()
Out[48]:
0           NaN
1     127999000
2     128000000
3     128001000
4     128000000
5     128000000
6     127999000
7     128001000
8     128000000
9     128000000
10    120000000
11    127999000
12    128000000
13    128000000
14    128001000
...
25646    128002000
25647    127998000
25648    127999000
25649    128002000
25650    127998000
25651    128002000
25652    127999000
25653    128002000
25654    127998000
25655    127999000
25656    128002000
25657    127998000
25658    128002000
25659    127999000
25660    128002000
Length: 25661, dtype: float64
  1. One thing would be nice if a diff on a time-series automatically becomes a timedelta?
  2. When converting the result of the ts.diff() with astype('timedelta64[ns']) it should convert the NaN into a NaT, if I understand things correctly?
    Instead this happens and I have to cut off the NaN to make it work:
In [49]: ts.diff().astype('timedelta[ns]')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-d1462d75fb33> in <module>()
----> 1 ts.diff().astype('timedelta[ns]')

/usr/local/epd/lib/python2.7/site-packages/pandas/core/series.pyc in astype(self, dtype)
    830         See numpy.ndarray.astype
    831         """
--> 832         casted = com._astype_nansafe(self.values, dtype)
    833         return self._constructor(casted, index=self.index, name=self.name,
    834                                  dtype=casted.dtype)

/usr/local/epd/lib/python2.7/site-packages/pandas/core/common.pyc in _astype_nansafe(arr, dtype, copy)
   1361     """ return a view if copy is False """
   1362     if not isinstance(dtype, np.dtype):
-> 1363         dtype = np.dtype(dtype)
   1364
   1365     if issubclass(arr.dtype.type, np.datetime64):

TypeError: data type not understood

In [50]: ts.diff().astype('timedelta64[ns]')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-50-e9c54d8f8d73> in <module>()
----> 1 ts.diff().astype('timedelta64[ns]')

/usr/local/epd/lib/python2.7/site-packages/pandas/core/series.pyc in astype(self, dtype)
    830         See numpy.ndarray.astype
    831         """
--> 832         casted = com._astype_nansafe(self.values, dtype)
    833         return self._constructor(casted, index=self.index, name=self.name,
    834                                  dtype=casted.dtype)

/usr/local/epd/lib/python2.7/site-packages/pandas/core/common.pyc in _astype_nansafe(arr, dtype, copy)
   1370
   1371         if np.isnan(arr).any():
-> 1372             raise ValueError('Cannot convert NA to integer')
   1373     elif arr.dtype == np.object_ and np.issubdtype(dtype.type, np.integer):
   1374         # work around NumPy brokenness, #1987

ValueError: Cannot convert NA to integer

In [51]: ts.diff()[1:].astype('timedelta64[ns]')
Out[51]:
1    00:00:00.127999
2    00:00:00.128000
3    00:00:00.128001
4    00:00:00.128000
5    00:00:00.128000
6    00:00:00.127999
7    00:00:00.128001
8    00:00:00.128000
9    00:00:00.128000
10   00:00:00.120000
11   00:00:00.127999
12   00:00:00.128000
13   00:00:00.128000
14   00:00:00.128001
15   00:00:00.128000
...
25646   00:00:00.128002
25647   00:00:00.127998
25648   00:00:00.127999
25649   00:00:00.128002
25650   00:00:00.127998
25651   00:00:00.128002
25652   00:00:00.127999
25653   00:00:00.128002
25654   00:00:00.127998
25655   00:00:00.127999
25656   00:00:00.128002
25657   00:00:00.127998
25658   00:00:00.128002
25659   00:00:00.127999
25660   00:00:00.128002
Length: 25660, dtype: timedelta64[ns]
@jreback
Copy link
Contributor

jreback commented Mar 7, 2013

s.diff() on datetime64[ns] is broken....use the workaround (which is essentially the same)

  1. yes that will happen at some point
  2. astype should convert, but I know I didn't put this in, you can however always construct a Series
    which will try to convert it
In [3]: s = pd.Series(pd.date_range('20130102',periods=6))

In [4]: s
Out[4]: 
0   2013-01-02 00:00:00
1   2013-01-03 00:00:00
2   2013-01-04 00:00:00
3   2013-01-05 00:00:00
4   2013-01-06 00:00:00
5   2013-01-07 00:00:00
dtype: datetime64[ns]

# workaround
In [5]: s - s.shift()
Out[5]: 
0                NaT
1   1 days, 00:00:00
2   1 days, 00:00:00
3   1 days, 00:00:00
4   1 days, 00:00:00
5   1 days, 00:00:00
dtype: timedelta64[ns]

# this is broken currently
In [6]: s.diff()
Out[6]: 
0             NaN
1    8.640000e+13
2    8.640000e+13
3    8.640000e+13
4    8.640000e+13
5    8.640000e+13
dtype: float64

@michaelaye
Copy link
Contributor Author

ok, I was not clear if it was considered broken or just a missing feature. Feel free to close it if you think it's a duplicate. Thanks for the summary!

@jreback
Copy link
Contributor

jreback commented Mar 7, 2013

no its missing, so going to leave your issue open....I sort of threw timedelta64[ns] together and testing out most of what I thought was useful...you have debugged the rest!

abs is available - see #2957

@jreback
Copy link
Contributor

jreback commented Mar 19, 2013

@michaelaye I am going to close this an open a new one for the missing diff feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants