Skip to content

BUG: Avoid Timedelta rounding when specifying unit and integer (#12690) #19732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
May 21, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
d86f26b
BUG: Avoid rounding when specifying unit and integer
mroeschke Feb 15, 2018
3ffd35c
Merge remote-tracking branch 'upstream/master' into timedelta_rounding
mroeschke Mar 29, 2018
aa22c87
post merge fix
mroeschke Mar 29, 2018
1c2360e
Merge remote-tracking branch 'upstream/master' into timedelta_rounding
mroeschke Apr 18, 2018
3f463ec
Merge remote-tracking branch 'upstream/master' into timedelta_rounding
mroeschke Apr 22, 2018
a4190a0
Merge remote-tracking branch 'upstream/master' into timedelta_rounding
mroeschke Apr 24, 2018
69ce6a6
Adjust test for floating point errors
mroeschke Apr 24, 2018
04d97a2
Merge remote-tracking branch 'upstream/master' into timedelta_rounding
mroeschke May 6, 2018
0b83501
Merge remote-tracking branch 'upstream/master' into timedelta_rounding
mroeschke May 10, 2018
43ea477
Merge remote-tracking branch 'upstream/master' into timedelta_rounding
mroeschke May 14, 2018
44031c6
Merge remote-tracking branch 'upstream/master' into timedelta_rounding
mroeschke May 15, 2018
8c9f99f
Add additional test and param check test
mroeschke May 15, 2018
8256870
Merge remote-tracking branch 'upstream/master' into timedelta_rounding
mroeschke May 16, 2018
ae1de9e
Merge remote-tracking branch 'upstream/master' into timedelta_rounding
mroeschke May 16, 2018
68d296d
Fix test and move whatsnew
mroeschke May 16, 2018
9439463
Merge remote-tracking branch 'upstream/master' into timedelta_rounding
mroeschke May 17, 2018
358b595
Merge remote-tracking branch 'upstream/master' into timedelta_rounding
mroeschke May 18, 2018
461e0ee
address review
mroeschke May 18, 2018
d9d71e0
Merge remote-tracking branch 'upstream/master' into timedelta_rounding
mroeschke May 19, 2018
7ba45b4
use updated issue number
mroeschke May 19, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion doc/source/whatsnew/v0.23.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,10 @@ Strings
^^^^^^^

- Bug in :meth:`Series.str.replace()` where the method throws `TypeError` on Python 3.5.2 (:issue: `21078`)
-

Timedelta
^^^^^^^^^
- Bug in :class:`Timedelta`: where passing a float with a unit would prematurely round the float precision (:issue: `14156`)

Categorical
^^^^^^^^^^^
Expand Down
16 changes: 8 additions & 8 deletions pandas/_libs/tslibs/timedeltas.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -202,22 +202,22 @@ cpdef inline int64_t cast_from_unit(object ts, object unit) except? -1:

if unit == 'D' or unit == 'd':
m = 1000000000L * 86400
p = 6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were set originally to avoid precision issues.

p = 9
elif unit == 'h':
m = 1000000000L * 3600
p = 6
p = 9
elif unit == 'm':
m = 1000000000L * 60
p = 6
p = 9
elif unit == 's':
m = 1000000000L
p = 6
p = 9
elif unit == 'ms':
m = 1000000L
p = 3
p = 6
elif unit == 'us':
m = 1000L
p = 0
p = 3
elif unit == 'ns' or unit is None:
m = 1L
p = 0
Expand All @@ -231,10 +231,10 @@ cpdef inline int64_t cast_from_unit(object ts, object unit) except? -1:
# cast the unit, multiply base/frace separately
# to avoid precision issues from float -> int
base = <int64_t> ts
frac = ts -base
frac = ts - base
if p:
frac = round(frac, p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might be able to do something like this

round(frac * 1000, p - 3) // 1000 (for p >- 3) but haven't looked really closely.

return <int64_t> (base *m) + <int64_t> (frac *m)
return <int64_t> (base * m) + <int64_t> (frac * m)


cdef inline _decode_if_necessary(object ts):
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/indexes/datetimes/test_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -650,6 +650,14 @@ def test_unit_mixed(self, cache):
with pytest.raises(ValueError):
pd.to_datetime(arr, errors='raise', cache=cache)

@pytest.mark.parametrize('cache', [True, False])
def test_unit_rounding(self, cache):
# GH 14156: argument will incur floating point errors but no
# premature rounding
result = pd.to_datetime(1434743731.8770001, unit='s', cache=cache)
expected = pd.Timestamp('2015-06-19 19:55:31.877000093')
assert result == expected

@pytest.mark.parametrize('cache', [True, False])
def test_dataframe(self, cache):

Expand Down
2 changes: 2 additions & 0 deletions pandas/tests/io/sas/test_sas7bdat.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,8 @@ def test_date_time():
fname = os.path.join(dirpath, "datetime.csv")
df0 = pd.read_csv(fname, parse_dates=['Date1', 'Date2', 'DateTime',
'DateTimeHi', 'Taiw'])
# GH 19732: Timestamps imported from sas will incur floating point errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this need to be changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there are floating point errors when importing dates from the sas file.

In [5]: pd.read_sas('pandas/tests/io/sas/data/datetime.sas7bdat')['DateTimeHi']
Out[5]:
0   1677-09-21 00:12:43.145225525
1   1960-01-01 00:00:00.000000000
2   2016-02-29 23:59:59.123456001
3   2262-04-11 23:47:16.854774475
Name: DateTimeHi, dtype: datetime64[ns]

# This matches dates in datetime.csv
In [6]: pd.read_csv('pandas/tests/io/sas/data/datetime.csv')['DateTimeHi']
Out[6]:
0    1677-09-21 00:12:43.145226
1    1960-01-01 00:00:00.000000
2    2016-02-29 23:59:59.123456
3    2262-04-11 23:47:16.854774
Name: DateTimeHi, dtype: object

I am not familiar with read_sas or the sas file that was created, but I am fairly certain it's due to the floating point errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you round these instead of specifying exact values

df.iloc[:, 3] = df.iloc[:, 3].dt.round('us')
tm.assert_frame_equal(df, df0)


Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/scalar/timedelta/test_timedelta.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,16 @@ def test_compare_timedelta_ndarray(self):

class TestTimedeltas(object):

@pytest.mark.parametrize("unit, value, expected", [
('us', 9.999, 9999), ('ms', 9.999999, 9999999),
('s', 9.999999999, 9999999999)])
def test_rounding_on_int_unit_construction(self, unit, value, expected):
# GH 12690
result = Timedelta(value, unit=unit)
assert result.value == expected
result = Timedelta(str(value) + unit)
assert result.value == expected

def test_total_seconds_scalar(self):
# see gh-10939
rng = Timedelta('1 days, 10:11:12.100123456')
Expand Down
85 changes: 47 additions & 38 deletions pandas/tests/scalar/timestamp/test_timestamp.py
Original file line number Diff line number Diff line change
Expand Up @@ -621,10 +621,51 @@ def test_basics_nanos(self):
assert stamp.microsecond == 145224
assert stamp.nanosecond == 192

def test_unit(self):

def check(val, unit=None, h=1, s=1, us=0):
stamp = Timestamp(val, unit=unit)
@pytest.mark.parametrize('value, check_kwargs', [
[946688461000000000, {}],
[946688461000000000 / long(1000), dict(unit='us')],
[946688461000000000 / long(1000000), dict(unit='ms')],
[946688461000000000 / long(1000000000), dict(unit='s')],
[10957, dict(unit='D', h=0)],
pytest.param((946688461000000000 + 500000) / long(1000000000),
dict(unit='s', us=499, ns=964),
marks=pytest.mark.skipif(not PY3,
reason='using truediv, so these'
' are like floats')),
pytest.param((946688461000000000 + 500000000) / long(1000000000),
dict(unit='s', us=500000),
marks=pytest.mark.skipif(not PY3,
reason='using truediv, so these'
' are like floats')),
pytest.param((946688461000000000 + 500000) / long(1000000),
dict(unit='ms', us=500),
marks=pytest.mark.skipif(not PY3,
reason='using truediv, so these'
' are like floats')),
pytest.param((946688461000000000 + 500000) / long(1000000000),
dict(unit='s'),
marks=pytest.mark.skipif(PY3,
reason='get chopped in py2')),
pytest.param((946688461000000000 + 500000000) / long(1000000000),
dict(unit='s'),
marks=pytest.mark.skipif(PY3,
reason='get chopped in py2')),
pytest.param((946688461000000000 + 500000) / long(1000000),
dict(unit='ms'),
marks=pytest.mark.skipif(PY3,
reason='get chopped in py2')),
[(946688461000000000 + 500000) / long(1000), dict(unit='us', us=500)],
[(946688461000000000 + 500000000) / long(1000000),
dict(unit='ms', us=500000)],
[946688461000000000 / 1000.0 + 5, dict(unit='us', us=5)],
[946688461000000000 / 1000.0 + 5000, dict(unit='us', us=5000)],
[946688461000000000 / 1000000.0 + 0.5, dict(unit='ms', us=500)],
[946688461000000000 / 1000000.0 + 0.005, dict(unit='ms', us=5, ns=5)],
[946688461000000000 / 1000000000.0 + 0.5, dict(unit='s', us=500000)],
[10957 + 0.5, dict(unit='D', h=12)]])
def test_unit(self, value, check_kwargs):
def check(value, unit=None, h=1, s=1, us=0, ns=0):
stamp = Timestamp(value, unit=unit)
assert stamp.year == 2000
assert stamp.month == 1
assert stamp.day == 1
Expand All @@ -637,41 +678,9 @@ def check(val, unit=None, h=1, s=1, us=0):
assert stamp.minute == 0
assert stamp.second == 0
assert stamp.microsecond == 0
assert stamp.nanosecond == 0

ts = Timestamp('20000101 01:01:01')
val = ts.value
days = (ts - Timestamp('1970-01-01')).days

check(val)
check(val / long(1000), unit='us')
check(val / long(1000000), unit='ms')
check(val / long(1000000000), unit='s')
check(days, unit='D', h=0)
assert stamp.nanosecond == ns

# using truediv, so these are like floats
if PY3:
check((val + 500000) / long(1000000000), unit='s', us=500)
check((val + 500000000) / long(1000000000), unit='s', us=500000)
check((val + 500000) / long(1000000), unit='ms', us=500)

# get chopped in py2
else:
check((val + 500000) / long(1000000000), unit='s')
check((val + 500000000) / long(1000000000), unit='s')
check((val + 500000) / long(1000000), unit='ms')

# ok
check((val + 500000) / long(1000), unit='us', us=500)
check((val + 500000000) / long(1000000), unit='ms', us=500000)

# floats
check(val / 1000.0 + 5, unit='us', us=5)
check(val / 1000.0 + 5000, unit='us', us=5000)
check(val / 1000000.0 + 0.5, unit='ms', us=500)
check(val / 1000000.0 + 0.005, unit='ms', us=5)
check(val / 1000000000.0 + 0.5, unit='s', us=500000)
check(days + 0.5, unit='D', h=12)
check(value, **check_kwargs)

def test_roundtrip(self):

Expand Down