-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Taking the mean of a long-spanning np.datetime64
array produces the wrong value
#10019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is a bit of a tricky problem. Pandas handles it essentially by converting the
but also, due to precision loss, fails on some other examples. For instance, taking the mean of the minimum representable
or taking the mean of a time distant from the Unix epoch can change its value:
Somewhat inspired by numpy/numpy#12901 (comment), a way around the precision loss issue would be to convert to xarray/xarray/core/duck_array_ops.py Lines 710 to 722 in c252152
with array_as_longdouble = where(isnat(array), np.nan, array.astype(np.longdouble))
return _mean(array_as_longdouble, axis=axis, skipna=skipna, **kwargs).astype(
array.dtype produces nice results on some systems where |
We could calculate the mean year upfront and use that as offset value instead the minimum value. I've drafted up a PR for further discussion: #10035 |
What happened?
As noted in #9977 (comment), taking the mean of a
np.datetime64
array with values that span more than half of the resolution-dependent range produces the incorrect result, e.g.:This is due to overflow when computing the timedeltas relative to the minimum datetime value of the array (code).
What did you expect to happen?
This is the outcome we would expect to see in this example:
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
The text was updated successfully, but these errors were encountered: