BUG: Avoid Timedelta rounding when specifying unit and integer (#12690) #19732

mroeschke · 2018-02-16T20:28:14Z

closes BUG: conversion precision on Timedeltas #14156
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2018-02-16T20:46:19Z

Hello @mroeschke! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on May 19, 2018 at 23:42 Hours UTC

jschendel · 2018-02-17T00:31:12Z

pandas/tests/scalar/test_timedelta.py

+        result = Timedelta(value, unit=unit)
+        assert result.value == expected
+        result = Timedelta(str(value) + unit)
+        assert result.value


looks like the last line is missing == expected

jschendel · 2018-02-17T00:34:49Z

doc/source/whatsnew/v0.23.0.txt

@@ -716,6 +716,7 @@ Datetimelike
 - Bug in :class:`Timestamp` and :func:`to_datetime` where a string representing a barely out-of-bounds timestamp would be incorrectly rounded down instead of raising ``OutOfBoundsDatetime`` (:issue:`19382`)
 - Bug in :func:`Timestamp.floor` :func:`DatetimeIndex.floor` where time stamps far in the future and past were not rounded correctly (:issue:`19206`)
 - Bug in :func:`to_datetime` where passing an out-of-bounds datetime with ``errors='coerce'`` and ``utc=True`` would raise ``OutOfBoundsDatetime`` instead of parsing to ``NaT`` (:issue:`19612`)
+- Bug in :class:`Timedelta`: where a numerical value with a unit would round values (:issue: `12690`)


there's an extra colon at the end of :class:`Timedelta`: that should be removed, and the space between :issue: and the number should also be removed

mroeschke · 2018-02-18T10:22:58Z

Although I believe these rounding constants are correct, it appears there are precision issues now for larger integers.

# Example from the failing test

In [6]: v
Out[6]: 946688461.0005

In [7]: pd._libs.tslibs.timedeltas.cast_from_unit(v, 's')
Out[7]: 946688461000499968

jreback · 2018-02-18T17:33:58Z

pandas/_libs/tslibs/timedeltas.pyx

@@ -200,22 +200,22 @@ cpdef inline int64_t cast_from_unit(object ts, object unit) except? -1:

    if unit == 'D' or unit == 'd':
        m = 1000000000L * 86400
-        p = 6


these were set originally to avoid precision issues.

jreback · 2018-02-18T17:37:58Z

pandas/_libs/tslibs/timedeltas.pyx

@@ -229,10 +229,10 @@ cpdef inline int64_t cast_from_unit(object ts, object unit) except? -1:
    # cast the unit, multiply base/frace separately
    # to avoid precision issues from float -> int
    base = <int64_t> ts
-    frac = ts -base
+    frac = ts - base
    if p:
        frac = round(frac, p)


you might be able to do something like this

round(frac * 1000, p - 3) // 1000 (for p >- 3) but haven't looked really closely.

mroeschke · 2018-02-25T03:14:31Z

Unfortunately using a 1000 multiplier didn't solve the issue, @jreback.

The issue with the example above is with floating point errors before rounding.

In [2]: v = 946688461.0005

In [3]: v - int(v)
Out[3]: 0.0004999637603759766

In [4]: round(v - int(v), 6) # precision in master
Out[4]: 0.0005

In [5]: round(v - int(v), 9) # precision in this PR (in order to preserve nanoseconds if specified)
Out[5]: 0.000499964

Would it be reasonable to cast v to a string in order to parse the whole number from the decimal?

In [8]: str(v).split('.')
Out[8]: ['946688461', '0005']

jreback · 2018-02-25T03:33:07Z

you cannot use strings ; rather you might look how CPython does this

Add tests Add whatsnew flake8 Add aditional formatting remove tabs address review

jorisvandenbossche · 2018-03-26T13:34:36Z

To come back to this, the question is if we should care. In the example of v = 946688461.0005, even in the full number (not only after splitting the fractional part), the actual precision is 946688461.00049996. So this is inherent problem of floating points even in the input, not only in our calculation.

As a user you can always solve this by eg doing *1000 your data and specify a different unit.

Sup3rGeo · 2018-03-26T13:39:42Z

Also complementing @jorisvandenbossche, I believe that the current rounding behavior is way, way more detrimental to users than this precision issue with the changes.

This means that even if we care about this precision issue, it would be better to merge this changes (reflecting correctly the timedelta value for a given integer and unit) and just open another issue on the precision thing for large numbers.

mroeschke · 2018-03-26T20:18:27Z

I agree as well that premature rounding is less desirable than floating point errors. I could issue a warning for now for floats with high significant digits.

jorisvandenbossche · 2018-03-29T07:45:19Z

I could issue a warning for now for floats with high significant digits.

Could we easily detect the case when the number of decimals is too high to preserve precision? In that case, a warning might be a good idea (then we can also give a hint on how to solve it)

mroeschke · 2018-04-08T02:33:28Z

@jorisvandenbossche after searching I don't think there's a reliable way to detect significant digits since not all floats are stored exactly.

An alternative for now is to live with the precision issue and offer a note in the docs?

jorisvandenbossche · 2018-04-09T08:03:11Z

An alternative for now is to live with the precision issue and offer a note in the docs?

Yes, that would be my preference. @jreback what's your view on this?

jreback · 2018-04-09T16:24:42Z

sure ok with simply documenting this as a known limitation. (doc-string and/or main docs)

codecov · 2018-04-24T05:51:25Z

Codecov Report

Merging #19732 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #19732   +/-   ##
=======================================
  Coverage   91.84%   91.84%           
=======================================
  Files         153      153           
  Lines       49499    49499           
=======================================
  Hits        45460    45460           
  Misses       4039     4039

Flag	Coverage Δ
#multiple	`90.23% <ø> (ø)`	⬆️
#single	`41.88% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc37ea2...7ba45b4. Read the comment docs.

mroeschke · 2018-04-24T16:46:19Z

Sorry for the delay on this PR.

I had to adjust some existing tests to account for floating point errors after this fix. Additionally, it appears we already have a note in the main docs about floating point errors when creating timestamps at the bottom of this section:

https://pandas.pydata.org/pandas-docs/stable/timeseries.html#epoch-timestamps

Ready for a final look and all green.

jreback

can you show a copy-pastable example which works now (and didn't before). I know your tests cover this, but easier to copy-paste it.

jreback · 2018-05-10T10:37:37Z

pandas/tests/scalar/timestamp/test_timestamp.py

@@ -623,7 +623,7 @@ def test_basics_nanos(self):

    def test_unit(self):

-        def check(val, unit=None, h=1, s=1, us=0):


would be nice to parameterize this test

jreback · 2018-05-10T10:39:03Z

pandas/tests/io/sas/test_sas7bdat.py

@@ -182,6 +182,11 @@ def test_date_time():
    fname = os.path.join(dirpath, "datetime.csv")
    df0 = pd.read_csv(fname, parse_dates=['Date1', 'Date2', 'DateTime',
                                          'DateTimeHi', 'Taiw'])
+    # GH 19732: Timestamps imported from sas will incur floating point errors


why did this need to be changed?

I believe there are floating point errors when importing dates from the sas file.

In [5]: pd.read_sas('pandas/tests/io/sas/data/datetime.sas7bdat')['DateTimeHi'] Out[5]: 0 1677-09-21 00:12:43.145225525 1 1960-01-01 00:00:00.000000000 2 2016-02-29 23:59:59.123456001 3 2262-04-11 23:47:16.854774475 Name: DateTimeHi, dtype: datetime64[ns] # This matches dates in datetime.csv In [6]: pd.read_csv('pandas/tests/io/sas/data/datetime.csv')['DateTimeHi'] Out[6]: 0 1677-09-21 00:12:43.145226 1 1960-01-01 00:00:00.000000 2 2016-02-29 23:59:59.123456 3 2262-04-11 23:47:16.854774 Name: DateTimeHi, dtype: object

I am not familiar with read_sas or the sas file that was created, but I am fairly certain it's due to the floating point errors

can you round these instead of specifying exact values

jreback · 2018-05-10T10:41:19Z

@mroeschke to my last, is there an explicit test from the OP?

mroeschke · 2018-05-15T05:34:20Z

Added an explicit test from the original issue (which was from you @jreback) and paramed an existing test.

jreback

small comments, otherwise lgtm.

jreback · 2018-05-17T10:12:03Z

pandas/tests/io/sas/test_sas7bdat.py

@@ -182,6 +182,11 @@ def test_date_time():
    fname = os.path.join(dirpath, "datetime.csv")
    df0 = pd.read_csv(fname, parse_dates=['Date1', 'Date2', 'DateTime',
                                          'DateTimeHi', 'Taiw'])
+    # GH 19732: Timestamps imported from sas will incur floating point errors


can you round these instead of specifying exact values

jreback · 2018-05-17T10:13:42Z

doc/source/whatsnew/v0.23.1.txt

+
+Timedelta
+^^^^^^^^^
+- Bug in :class:`Timedelta`: where a numerical value with a unit would round values (:issue: `12690`)


can you be a little bit precise here, e.g. i think a reader might not get what this change is

mroeschke · 2018-05-18T06:20:11Z

Clarified the whatsnew entry and rounded the datetimes from the imported sas file. All green.

jreback · 2018-05-19T20:12:16Z

doc/source/whatsnew/v0.23.1.txt

+
+Timedelta
+^^^^^^^^^
+- Bug in :class:`Timedelta`: where passing a float with a unit would prematurely round the float precision (:issue: `12690`)


shouldn't this be 14156?

mroeschke · 2018-05-20T05:54:18Z

Thanks @jreback. I had reference the older issue that was replaced by OPs.

jreback · 2018-05-21T11:06:07Z

thanks @mroeschke

jorisvandenbossche · 2018-05-28T08:10:05Z

As this is strictly spoken an API change for those that used a float with less precision and now get higher precision with floating point errors, I would rather keep this for 0.24.0 ?

…s-dev#12690) (pandas-dev#19732) (cherry picked from commit 81358e8)

… (#19732) (cherry picked from commit 81358e8)

…s-dev#12690) (pandas-dev#19732)

mroeschke changed the title ~~BUG: Avoid rounding when specifying unit and integer (#12690)~~ BUG: Avoid Timedelta rounding when specifying unit and integer (#12690) Feb 16, 2018

mroeschke force-pushed the timedelta_rounding branch from a3360e4 to 30eab2f Compare February 16, 2018 20:46

mroeschke force-pushed the timedelta_rounding branch from 30eab2f to 6a98496 Compare February 16, 2018 20:48

jschendel reviewed Feb 17, 2018

View reviewed changes

jreback requested changes Feb 18, 2018

View reviewed changes

jreback added Bug Timedelta Timedelta data type labels Feb 18, 2018

BUG: Avoid rounding when specifying unit and integer

d86f26b

Add tests Add whatsnew flake8 Add aditional formatting remove tabs address review

Merge remote-tracking branch 'upstream/master' into timedelta_rounding

3ffd35c

mroeschke force-pushed the timedelta_rounding branch from 457a265 to 3ffd35c Compare March 29, 2018 04:41

post merge fix

aa22c87

mroeschke added 4 commits April 18, 2018 08:14

Merge remote-tracking branch 'upstream/master' into timedelta_rounding

1c2360e

Merge remote-tracking branch 'upstream/master' into timedelta_rounding

3f463ec

Merge remote-tracking branch 'upstream/master' into timedelta_rounding

a4190a0

Adjust test for floating point errors

69ce6a6

jreback requested changes May 10, 2018

View reviewed changes

mroeschke added 3 commits May 13, 2018 21:52

Merge remote-tracking branch 'upstream/master' into timedelta_rounding

43ea477

Merge remote-tracking branch 'upstream/master' into timedelta_rounding

44031c6

Add additional test and param check test

8c9f99f

mroeschke added 4 commits May 15, 2018 18:58

Merge remote-tracking branch 'upstream/master' into timedelta_rounding

8256870

Merge remote-tracking branch 'upstream/master' into timedelta_rounding

ae1de9e

Fix test and move whatsnew

68d296d

Merge remote-tracking branch 'upstream/master' into timedelta_rounding

9439463

jreback requested changes May 17, 2018

View reviewed changes

mroeschke added 2 commits May 17, 2018 20:08

Merge remote-tracking branch 'upstream/master' into timedelta_rounding

358b595

address review

461e0ee

jreback requested changes May 19, 2018

View reviewed changes

jreback added this to the 0.23.1 milestone May 19, 2018

jreback added the Needs Backport label May 19, 2018

mroeschke added 2 commits May 19, 2018 16:40

Merge remote-tracking branch 'upstream/master' into timedelta_rounding

d9d71e0

use updated issue number

7ba45b4

jreback approved these changes May 21, 2018

View reviewed changes

jreback merged commit 81358e8 into pandas-dev:master May 21, 2018

mroeschke deleted the timedelta_rounding branch May 21, 2018 16:58

jorisvandenbossche removed the Needs Backport label Jun 8, 2018

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Jun 8, 2018

BUG: Avoid Timedelta rounding when specifying unit and integer (panda…

2bd51f9

…s-dev#12690) (pandas-dev#19732) (cherry picked from commit 81358e8)

jorisvandenbossche pushed a commit that referenced this pull request Jun 9, 2018

BUG: Avoid Timedelta rounding when specifying unit and integer (#12690)…

e3d7b6f

… (#19732) (cherry picked from commit 81358e8)

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018

BUG: Avoid Timedelta rounding when specifying unit and integer (panda…

77d2a72

…s-dev#12690) (pandas-dev#19732)

mroeschke mentioned this pull request Oct 9, 2018

misrepresented fractional seconds in timestamps and timedeltas #23059

Open

		@@ -623,7 +623,7 @@ def test_basics_nanos(self):

		def test_unit(self):

		def check(val, unit=None, h=1, s=1, us=0):

Uh oh!

BUG: Avoid Timedelta rounding when specifying unit and integer (#12690) #19732

BUG: Avoid Timedelta rounding when specifying unit and integer (#12690) #19732

Uh oh!

Conversation

mroeschke commented Feb 16, 2018 • edited by jorisvandenbossche Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Feb 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on May 19, 2018 at 23:42 Hours UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Feb 18, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Feb 25, 2018

Uh oh!

jreback commented Feb 25, 2018

Uh oh!

jorisvandenbossche commented Mar 26, 2018

Uh oh!

Sup3rGeo commented Mar 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mroeschke commented Mar 26, 2018

Uh oh!

jorisvandenbossche commented Mar 29, 2018

Uh oh!

mroeschke commented Apr 8, 2018

Uh oh!

jorisvandenbossche commented Apr 9, 2018

Uh oh!

jreback commented Apr 9, 2018

Uh oh!

codecov bot commented Apr 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mroeschke commented Apr 24, 2018

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented May 10, 2018

Uh oh!

mroeschke commented May 15, 2018

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke commented May 18, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke commented May 20, 2018

Uh oh!

jreback commented May 21, 2018

Uh oh!

jorisvandenbossche commented May 28, 2018

Uh oh!

Uh oh!

mroeschke commented Feb 16, 2018 •

edited by jorisvandenbossche

Loading

pep8speaks commented Feb 16, 2018 •

edited

Loading

Sup3rGeo commented Mar 26, 2018 •

edited

Loading

codecov bot commented Apr 24, 2018 •

edited

Loading