Skip to content

BUG: Fix nanosecond timedeltas #45108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 31, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,8 @@ Other enhancements
- :meth:`is_list_like` now identifies duck-arrays as list-like unless ``.ndim == 0`` (:issue:`35131`)
- :class:`ExtensionDtype` and :class:`ExtensionArray` are now (de)serialized when exporting a :class:`DataFrame` with :meth:`DataFrame.to_json` using ``orient='table'`` (:issue:`20612`, :issue:`44705`).
- Add support for `Zstandard <http://facebook.github.io/zstd/>`_ compression to :meth:`DataFrame.to_pickle`/:meth:`read_pickle` and friends (:issue:`43925`)
- :class:`Timedelta` now properly taking into account any nanoseconds contribution (:issue: `43764`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- :class:`Timedelta` now properly taking into account any nanoseconds contribution (:issue: `43764`)
- :class:`Timedelta` now properly taking into account any nanoseconds contribution (:issue:`43764`)

- :meth:`Timedelta.total_seconds()` now properly taking into account any nanoseconds contribution (:issue: `40946`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- :meth:`Timedelta.total_seconds()` now properly taking into account any nanoseconds contribution (:issue: `40946`)
- :meth:`Timedelta.total_seconds()` now properly taking into account any nanoseconds contribution (:issue:`40946`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

-


Expand Down
15 changes: 14 additions & 1 deletion pandas/_libs/tslibs/timedeltas.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1268,7 +1268,16 @@ class Timedelta(_Timedelta):

kwargs = {key: _to_py_int_float(kwargs[key]) for key in kwargs}

nano = convert_to_timedelta64(kwargs.pop('nanoseconds', 0), 'ns')
# GH43764, making sure any nanoseconds contributions from any kwarg is taken into consideration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm this is just duplicating code from below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly which code you're referring to, but if I do not catch all nanoseconds contributions when a Timedelta instance is prepared using any of the "microseconds", "milliseconds" and "seconds" kwargs or any combination therefrom, the potential nanosecond contributions from every of those kwargs must be caught here, otherwise it will be lost information, e.g. Timedelta(seconds=1e-9, milliseconds=1e-5, microseconds=1e-1).

nano = convert_to_timedelta64(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are entirely duplicating L1283

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My modifications removed "microseconds" till "seconds" from kwargs. Any other potential kwarg, such as year etc, are handled the same as the previous way, namely via L1283. Do you have other suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually don't have a problem with doing what you are doing but then you need to entirely remove L1282-1284 and handle all the kwargs

(well you still need the try/except).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will also slow this function down a lot. rather than poping be explicit on the argument handling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check the last changes? I profiled instantiation time and it is taking about 50% the time as in master.

def test():
	pd.Timedelta(seconds=float(106751 * 24 * 3600), nanoseconds=1)
	pd.Timedelta(minutes=-7)
	pd.Timedelta(seconds=1234e-9)
	pd.Timedelta(seconds=1e-9, milliseconds=1e-5, microseconds=1e-1)
	pd.Timedelta(days=1, seconds=1e-9, milliseconds=1e-5, microseconds=1e-1)

(
kwargs.pop('nanoseconds', 0)
+ kwargs.pop('microseconds', 0) * 1000
+ kwargs.pop('milliseconds', 0) * 1000000
+ kwargs.pop('seconds', 0) * 1000000000
), 'ns'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because these are floats, I assume that this will lose precision near the limits. i.e. seconds = 106751 (days) * 24*3600 with a nanosecond component specified too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure, but I would expect that when aiming at precision the inputs should instead be coded as integers by the caller? My motivation here is only to make the instantiation consistent in that, if float inputs are allowed, then any nanosecond contributions from any supplied kwargs must be taken into consideration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, if any of these components is a float.

so pd.Timedelta(seconds=float(106751 * 24 * 3600), nanoseconds=1)

gives Timedelta('106751 days 00:00:00') whereas on master we get Timedelta('106751 days 00:00:00.000000001')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check my last commit? It produces the same output as in master.

)

try:
value = nano + convert_to_timedelta64(timedelta(**kwargs),
'ns')
Expand Down Expand Up @@ -1520,6 +1529,10 @@ class Timedelta(_Timedelta):
div = other // self
return div, other - div * self

# GH40946
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc string and types pls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done on premise. (Depends on the decision to separate the issues in two PRs or not)

def total_seconds(self):
return self.value / 1e9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internally, ns precision is achieved by storing the value as a 64 bit integer.

once this is converted to a float, we cannot maintain ns precision. However, including the nanosecond component is probably more accurate than ignoring it.

I think we need more discussion on the issue before re-instating the nanosecond "precision".

from #31380 (comment)

... from "dubiously nanosecond-precision" to "microsecond-precision" ...

The change to ms precision was an intentional one but was not documented.

Changing it back would probably require a specific mention in the release note.

Also the tests are failing for the cases added when the precision of total_seconds changed, so this needs to be fixed also.

I also think that the 2 issue should be addressed independently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I go and refactor/enhance the tests, it would be nice to have a categorical decision on this topic. Should the total_seconds part be a separate PR? I am not really sure why the nanosecond precision is a problem. The precise epoch in ns can be obtained by the value property.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the total_seconds part be a separate PR?

the two fixes are orthogonal? The override of the total_seconds method to fix #40946 and the change to __new__ to fix the constructor issue #43764 (comment)?

If so, I think should be two separate PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pls split to a sepearate PR



cdef bint is_any_td_scalar(object obj):
"""
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/tslibs/test_timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,15 @@
(np.timedelta64(14, "D"), 14 * 24 * 3600 * 1e9),
(Timedelta(minutes=-7), -7 * 60 * 1e9),
(Timedelta(minutes=-7).to_pytimedelta(), -7 * 60 * 1e9),
(Timedelta(seconds=1234e-9), 1234), # GH43764, GH40946
(
Timedelta(seconds=1e-9, milliseconds=1e-5, microseconds=1e-1),
111,
), # GH43764, GH40946
(
Timedelta(days=1, seconds=1e-9, milliseconds=1e-5, microseconds=1e-1),
24 * 3600e9 + 111,
), # GH43764, GH40946
(offsets.Nano(125), 125),
(1, 1),
(np.int64(2), 2),
Expand Down