-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Avoid Timedelta rounding when specifying unit and integer (#12690) #19732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a3360e4
to
30eab2f
Compare
Hello @mroeschke! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on May 19, 2018 at 23:42 Hours UTC |
30eab2f
to
6a98496
Compare
result = Timedelta(value, unit=unit) | ||
assert result.value == expected | ||
result = Timedelta(str(value) + unit) | ||
assert result.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the last line is missing == expected
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -716,6 +716,7 @@ Datetimelike | |||
- Bug in :class:`Timestamp` and :func:`to_datetime` where a string representing a barely out-of-bounds timestamp would be incorrectly rounded down instead of raising ``OutOfBoundsDatetime`` (:issue:`19382`) | |||
- Bug in :func:`Timestamp.floor` :func:`DatetimeIndex.floor` where time stamps far in the future and past were not rounded correctly (:issue:`19206`) | |||
- Bug in :func:`to_datetime` where passing an out-of-bounds datetime with ``errors='coerce'`` and ``utc=True`` would raise ``OutOfBoundsDatetime`` instead of parsing to ``NaT`` (:issue:`19612`) | |||
- Bug in :class:`Timedelta`: where a numerical value with a unit would round values (:issue: `12690`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's an extra colon at the end of :class:`Timedelta`: that should be removed, and the space between :issue: and the number should also be removed
Although I believe these rounding constants are correct, it appears there are precision issues now for larger integers.
|
@@ -200,22 +200,22 @@ cpdef inline int64_t cast_from_unit(object ts, object unit) except? -1: | |||
|
|||
if unit == 'D' or unit == 'd': | |||
m = 1000000000L * 86400 | |||
p = 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these were set originally to avoid precision issues.
@@ -229,10 +229,10 @@ cpdef inline int64_t cast_from_unit(object ts, object unit) except? -1: | |||
# cast the unit, multiply base/frace separately | |||
# to avoid precision issues from float -> int | |||
base = <int64_t> ts | |||
frac = ts -base | |||
frac = ts - base | |||
if p: | |||
frac = round(frac, p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might be able to do something like this
round(frac * 1000, p - 3) // 1000
(for p >- 3) but haven't looked really closely.
Unfortunately using a 1000 multiplier didn't solve the issue, @jreback. The issue with the example above is with floating point errors before rounding.
Would it be reasonable to cast
|
you cannot use strings ; rather you might look how CPython does this |
Add tests Add whatsnew flake8 Add aditional formatting remove tabs address review
To come back to this, the question is if we should care. In the example of As a user you can always solve this by eg doing |
Also complementing @jorisvandenbossche, I believe that the current rounding behavior is way, way more detrimental to users than this precision issue with the changes. This means that even if we care about this precision issue, it would be better to merge this changes (reflecting correctly the timedelta value for a given integer and unit) and just open another issue on the precision thing for large numbers. |
I agree as well that premature rounding is less desirable than floating point errors. I could issue a warning for now for floats with high significant digits. |
457a265
to
3ffd35c
Compare
Could we easily detect the case when the number of decimals is too high to preserve precision? In that case, a warning might be a good idea (then we can also give a hint on how to solve it) |
@jorisvandenbossche after searching I don't think there's a reliable way to detect significant digits since not all floats are stored exactly. An alternative for now is to live with the precision issue and offer a note in the docs? |
Yes, that would be my preference. @jreback what's your view on this? |
sure ok with simply documenting this as a known limitation. (doc-string and/or main docs) |
Codecov Report
@@ Coverage Diff @@
## master #19732 +/- ##
=======================================
Coverage 91.84% 91.84%
=======================================
Files 153 153
Lines 49499 49499
=======================================
Hits 45460 45460
Misses 4039 4039
Continue to review full report at Codecov.
|
Sorry for the delay on this PR. I had to adjust some existing tests to account for floating point errors after this fix. Additionally, it appears we already have a note in the main docs about floating point errors when creating timestamps at the bottom of this section: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#epoch-timestamps Ready for a final look and all green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you show a copy-pastable example which works now (and didn't before). I know your tests cover this, but easier to copy-paste it.
@@ -623,7 +623,7 @@ def test_basics_nanos(self): | |||
|
|||
def test_unit(self): | |||
|
|||
def check(val, unit=None, h=1, s=1, us=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to parameterize this test
@@ -182,6 +182,11 @@ def test_date_time(): | |||
fname = os.path.join(dirpath, "datetime.csv") | |||
df0 = pd.read_csv(fname, parse_dates=['Date1', 'Date2', 'DateTime', | |||
'DateTimeHi', 'Taiw']) | |||
# GH 19732: Timestamps imported from sas will incur floating point errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did this need to be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there are floating point errors when importing dates from the sas file.
In [5]: pd.read_sas('pandas/tests/io/sas/data/datetime.sas7bdat')['DateTimeHi']
Out[5]:
0 1677-09-21 00:12:43.145225525
1 1960-01-01 00:00:00.000000000
2 2016-02-29 23:59:59.123456001
3 2262-04-11 23:47:16.854774475
Name: DateTimeHi, dtype: datetime64[ns]
# This matches dates in datetime.csv
In [6]: pd.read_csv('pandas/tests/io/sas/data/datetime.csv')['DateTimeHi']
Out[6]:
0 1677-09-21 00:12:43.145226
1 1960-01-01 00:00:00.000000
2 2016-02-29 23:59:59.123456
3 2262-04-11 23:47:16.854774
Name: DateTimeHi, dtype: object
I am not familiar with read_sas or the sas file that was created, but I am fairly certain it's due to the floating point errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you round these instead of specifying exact values
@mroeschke to my last, is there an explicit test from the OP? |
Added an explicit test from the original issue (which was from you @jreback) and paramed an existing test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments, otherwise lgtm.
@@ -182,6 +182,11 @@ def test_date_time(): | |||
fname = os.path.join(dirpath, "datetime.csv") | |||
df0 = pd.read_csv(fname, parse_dates=['Date1', 'Date2', 'DateTime', | |||
'DateTimeHi', 'Taiw']) | |||
# GH 19732: Timestamps imported from sas will incur floating point errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you round these instead of specifying exact values
doc/source/whatsnew/v0.23.1.txt
Outdated
|
||
Timedelta | ||
^^^^^^^^^ | ||
- Bug in :class:`Timedelta`: where a numerical value with a unit would round values (:issue: `12690`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you be a little bit precise here, e.g. i think a reader might not get what this change is
Clarified the whatsnew entry and rounded the datetimes from the imported sas file. All green. |
doc/source/whatsnew/v0.23.1.txt
Outdated
|
||
Timedelta | ||
^^^^^^^^^ | ||
- Bug in :class:`Timedelta`: where passing a float with a unit would prematurely round the float precision (:issue: `12690`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be 14156?
Thanks @jreback. I had reference the older issue that was replaced by OPs. |
thanks @mroeschke |
As this is strictly spoken an API change for those that used a float with less precision and now get higher precision with floating point errors, I would rather keep this for 0.24.0 ? |
…s-dev#12690) (pandas-dev#19732) (cherry picked from commit 81358e8)
git diff upstream/master -u -- "*.py" | flake8 --diff