-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF/ENH: Constructors for DatetimeArray/TimedeltaArray #23493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f25d24c
e47c200
2cb7597
a4512b7
a5ef959
83b04fe
e8abc83
a4c8671
98dca45
56fd95e
5f92cfa
0e15536
1a015f6
5445a56
272f4b1
35195bd
bb394dc
510ae3d
49cf495
3a62633
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -119,7 +119,8 @@ def wrapper(self, other): | |
if isinstance(other, list): | ||
# FIXME: This can break for object-dtype with mixed types | ||
other = type(self)(other) | ||
elif not isinstance(other, (np.ndarray, ABCIndexClass, ABCSeries)): | ||
elif not isinstance(other, (np.ndarray, ABCIndexClass, ABCSeries, | ||
DatetimeArrayMixin)): | ||
# Following Timestamp convention, __eq__ is all-False | ||
# and __ne__ is all True, others raise TypeError. | ||
return ops.invalid_comparison(self, other, op) | ||
|
@@ -170,6 +171,8 @@ class DatetimeArrayMixin(dtl.DatetimeLikeArrayMixin): | |
# Constructors | ||
|
||
_attributes = ["freq", "tz"] | ||
_freq = None | ||
_tz = None | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
@classmethod | ||
def _simple_new(cls, values, freq=None, tz=None, **kwargs): | ||
|
@@ -193,11 +196,16 @@ def _simple_new(cls, values, freq=None, tz=None, **kwargs): | |
result._tz = timezones.tz_standardize(tz) | ||
return result | ||
|
||
def __new__(cls, values, freq=None, tz=None, dtype=None): | ||
def __new__(cls, values, freq=None, tz=None, dtype=None, copy=False): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW, if you want to simplify this already a bit more, you could rename the current |
||
if isinstance(values, (list, tuple)) or is_object_dtype(values): | ||
values = cls._from_sequence(values, copy=copy) | ||
# TODO: Can we set copy=False here to avoid re-coping? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC, then yes you're OK setting There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Further question: it is not (yet) possible to simply remove this case? (eventually we should not call the DatetimeArray constructor with an array-like of scalars) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not if we want to share the extant arithmetic tests (which we do)
I don't share this opinion, would prefer to delay this discussion until it is absolutely necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Then please raise this in the appropriate issue, as we have been discussing this before (I think it is #23212, although there is probably some more scattered discussion on other related PRs)
It is here that you are redesigning the constructors for the array refactor, IIUC, so if there is a time we should discuss it, it is now I think?
Can you clarify this a little bit? At what point do the arithmetic tests need to deal with array of objects? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The pertinent word here is "extant". Many of the tests in tests/arithmetic pass a list into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ignoring the tests for a moment, I thought we were all on board with the goal of the Back to the tests, it looks like you can you add an entry to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
My comment to Joris below about mothballing this conversation applies. But short answer is no: I did not get on board with that. |
||
|
||
if tz is None and hasattr(values, 'tz'): | ||
# e.g. DatetimeIndex | ||
# e.g. DatetimeArray, DatetimeIndex | ||
tz = values.tz | ||
|
||
# TODO: what about if freq == 'infer'? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. then we should also get the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's what I'm thinking, yah |
||
if freq is None and hasattr(values, "freq"): | ||
# i.e. DatetimeArray, DatetimeIndex | ||
freq = values.freq | ||
|
@@ -207,26 +215,46 @@ def __new__(cls, values, freq=None, tz=None, dtype=None): | |
# if dtype has an embedded tz, capture it | ||
tz = dtl.validate_tz_from_dtype(dtype, tz) | ||
|
||
if isinstance(values, DatetimeArrayMixin): | ||
if lib.is_scalar(values): | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
raise TypeError(dtl.scalar_data_error(values, cls)) | ||
elif isinstance(values, ABCSeries): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would get out the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll see if there is a graceful way to do this in the next pass (if I ever manage to catch up with all these comments!) |
||
# extract nanosecond unix timestamps | ||
if tz is None: | ||
# TODO: Try to do this in just one place | ||
tz = values.dt.tz | ||
values = np.array(values.view('i8')) | ||
elif isinstance(values, DatetimeArrayMixin): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And you don't need to get the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No. For the moment we are still using inheritance, so this would mess up for DatetimeIndex == DatetimeArray. When we change to composition this check will have to become |
||
# extract nanosecond unix timestamps | ||
values = values.asi8 | ||
|
||
if values.dtype == 'i8': | ||
values = values.view('M8[ns]') | ||
|
||
assert isinstance(values, np.ndarray), type(values) | ||
assert is_datetime64_dtype(values) # not yet assured nanosecond | ||
values = conversion.ensure_datetime64ns(values, copy=False) | ||
values = conversion.ensure_datetime64ns(values, copy=copy) | ||
|
||
result = cls._simple_new(values, freq=freq, tz=tz) | ||
if freq_infer: | ||
inferred = result.inferred_freq | ||
if inferred: | ||
result.freq = to_offset(inferred) | ||
dtl.maybe_define_freq(freq_infer, result) | ||
|
||
# NB: Among other things not yet ported from the DatetimeIndex | ||
# constructor, this does not call _deepcopy_if_needed | ||
return result | ||
|
||
@classmethod | ||
def _from_sequence(cls, scalars, dtype=None, copy=False): | ||
# list, tuple, or object-dtype ndarray/Index | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do you need to turn into an object array here? to_datetime handles all of these cases There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right we could make do without it. I like doing this explicitly because to_datetime is already overloaded and circular. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is horribly inefficient and unnecessary There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we don't do it here, to_datetime is going to do this. It may be unnecessary, but it is not horribly inefficient. What is a code smell is the circularity involved in calling to_datetime. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. then just call array_to_datetime and don’t force the conversion to array There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So is the root problem (referenced in your "circularity" comment, and down below in Could we have the public array = _to_datetime(...)
return DatetimeIndex(array) so the internal There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's not the fact that it's an Index so much as that it is a circular dependency. I think I can resolve this in an upcoming commit.
_convert_listlike_datetimes calls
Not sure what you're referring to. As implemented _from_sequence is specifically for list, tuple, or object-dtype NDArray/Index. datetime64-dtype goes through a different path. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's after an # these are shortcutable
if is_datetime64tz_dtype(arg):
if not isinstance(arg, DatetimeIndex):
return DatetimeIndex(arg, tz=tz, name=name)
if tz == 'utc':
arg = arg.tz_convert(None).tz_localize(tz)
return arg
elif is_datetime64_ns_dtype(arg):
if box and not isinstance(arg, DatetimeIndex):
try:
return DatetimeIndex(arg, tz=tz, name=name)
except ValueError:
pass
return arg So those both avoid conversion to object. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger thank you for clarifying; I was under the mistaken impression that it was specifically list/tuple/object-dtype. Are there any restrictions on kwargs that can be added to it? In particular I'm thinking of |
||
values = np.array(scalars, dtype=np.object_, copy=copy) | ||
if values.ndim != 1: | ||
raise TypeError("Values must be 1-dimensional") | ||
|
||
# TODO: See if we can decrease circularity | ||
from pandas.core.tools.datetimes import to_datetime | ||
values = to_datetime(values) | ||
|
||
# pass dtype to constructor in order to convert timezone if necessary | ||
return cls(values, dtype=dtype) | ||
|
||
@classmethod | ||
def _generate_range(cls, start, end, periods, freq, tz=None, | ||
normalize=False, ambiguous='raise', closed=None): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,23 +3,22 @@ | |
|
||
import numpy as np | ||
|
||
from pandas._libs import tslibs | ||
from pandas._libs import tslibs, lib, algos | ||
from pandas._libs.tslibs import Timedelta, Timestamp, NaT | ||
from pandas._libs.tslibs.fields import get_timedelta_field | ||
from pandas._libs.tslibs.timedeltas import array_to_timedelta64 | ||
|
||
from pandas import compat | ||
|
||
from pandas.core.dtypes.common import ( | ||
_TD_DTYPE, is_list_like) | ||
_TD_DTYPE, is_list_like, is_object_dtype, is_timedelta64_dtype) | ||
from pandas.core.dtypes.generic import ABCSeries | ||
from pandas.core.dtypes.missing import isna | ||
|
||
import pandas.core.common as com | ||
from pandas.core.algorithms import checked_add_with_arr | ||
|
||
from pandas.tseries.offsets import Tick | ||
from pandas.tseries.frequencies import to_offset | ||
|
||
from . import datetimelike as dtl | ||
|
||
|
@@ -112,9 +111,7 @@ def dtype(self): | |
|
||
@classmethod | ||
def _simple_new(cls, values, freq=None, dtype=_TD_DTYPE): | ||
# `dtype` is passed by _shallow_copy in corner cases, should always | ||
# be timedelta64[ns] if present | ||
assert dtype == _TD_DTYPE | ||
_require_m8ns_dtype(dtype) | ||
assert isinstance(values, np.ndarray), type(values) | ||
|
||
if values.dtype == 'i8': | ||
|
@@ -127,22 +124,48 @@ def _simple_new(cls, values, freq=None, dtype=_TD_DTYPE): | |
result._freq = freq | ||
return result | ||
|
||
def __new__(cls, values, freq=None): | ||
def __new__(cls, values, freq=None, dtype=_TD_DTYPE, copy=False): | ||
_require_m8ns_dtype(dtype) | ||
|
||
if isinstance(values, (list, tuple)) or is_object_dtype(values): | ||
values = cls._from_sequence(values, copy=copy)._data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you cannot return here directly? |
||
# TODO: can we set copy=False to avoid re-copying? | ||
|
||
freq, freq_infer = dtl.maybe_infer_freq(freq) | ||
|
||
values = np.array(values, copy=False) | ||
if values.dtype == np.object_: | ||
values = array_to_timedelta64(values) | ||
if lib.is_scalar(values): | ||
raise TypeError(dtl.scalar_data_error(values, cls)) | ||
elif isinstance(values, TimedeltaArrayMixin): | ||
if freq is None and values.freq is not None: | ||
freq = values.freq | ||
freq_infer = False | ||
values = values._data | ||
|
||
result = cls._simple_new(values, freq=freq) | ||
if freq_infer: | ||
inferred = result.inferred_freq | ||
if inferred: | ||
result.freq = to_offset(inferred) | ||
values = np.array(values, copy=copy) | ||
|
||
if values.dtype == 'i8': | ||
pass | ||
elif not is_timedelta64_dtype(values): | ||
raise TypeError(values.dtype) | ||
elif values.dtype != _TD_DTYPE: | ||
# i.e. non-nano unit | ||
# TODO: use tslibs.conversion func? watch out for overflows | ||
values = values.astype(_TD_DTYPE) | ||
|
||
result = cls._simple_new(values, freq=freq) | ||
dtl.maybe_define_freq(freq_infer, result) | ||
return result | ||
|
||
@classmethod | ||
def _from_sequence(cls, scalars, dtype=_TD_DTYPE, copy=False): | ||
# list, tuple, or object-dtype ndarray/Index | ||
values = np.array(scalars, dtype=np.object_, copy=copy) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't call to_timedelta, so this does require that we pass an object array. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. then it should There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, it shouldn't. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Besides this is what |
||
if values.ndim != 1: | ||
raise TypeError("Values must be 1-dimensional") | ||
|
||
result = array_to_timedelta64(values) | ||
return cls(result, dtype=dtype) | ||
|
||
@classmethod | ||
def _generate_range(cls, start, end, periods, freq, closed=None): | ||
|
||
|
@@ -180,6 +203,23 @@ def _generate_range(cls, start, end, periods, freq, closed=None): | |
|
||
return cls._simple_new(index, freq=freq) | ||
|
||
# ---------------------------------------------------------------- | ||
# Array-Like Methods | ||
# NB: these are appreciably less efficient than the TimedeltaIndex versions | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because of (lack of) caching? This comment makes it seems like it's slower in general, when (if it's caching) it's just slower on repeated use). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW (as mentioned elsewhere), I am not sure we should add them as public methods. If we do so, we should add them to all our EAs, or actually even to the EA interface, and not only to TimedeltaArray (or datetimelike arrays). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not necessarily opposed to this, but this isn't obvious to me.
Because the Index version defines monotonic_increasing, monotonic_decreasing, and is_unique in a single call via _engine. |
||
|
||
@property | ||
def is_monotonic_increasing(self): | ||
return algos.is_monotonic(self.asi8, timelike=True)[0] | ||
|
||
@property | ||
def is_monotonic_decreasing(self): | ||
return algos.is_monotonic(self.asi8, timelike=True)[1] | ||
|
||
@property | ||
def is_unique(self): | ||
from pandas.core.algorithms import unique1d | ||
return len(unique1d(self.asi8)) == len(self) | ||
|
||
# ---------------------------------------------------------------- | ||
# Arithmetic Methods | ||
|
||
|
@@ -413,3 +453,21 @@ def _generate_regular_range(start, end, periods, offset): | |
|
||
data = np.arange(b, e, stride, dtype=np.int64) | ||
return data | ||
|
||
|
||
def _require_m8ns_dtype(dtype): | ||
""" | ||
`dtype` is included in the constructor signature for consistency with | ||
DatetimeArray and PeriodArray, but only timedelta64[ns] is considered | ||
valid. Raise if anything else is passed. | ||
|
||
Parameters | ||
---------- | ||
dtype : dtype | ||
|
||
Raises | ||
------ | ||
ValueError | ||
""" | ||
if dtype != _TD_DTYPE: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AssertionError no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When called from _simple_new this is internal so AssertionError would make sense, but it is also called from Either way I need to add tests for this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well this should never happen all conversions should be before this so it should assert There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no my point is there should be and i think currently there is already conversion if it’s wrong at this point it’s not a user error but an incorrect path taken There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My point is that this check function is called two times, one of which is the very first thing in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Apart from the discussion above, is it worth having a 15 line function (including docstrings :-)), for a 2-liner used in two places? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reasonable. But hey, its a nice docstring. |
||
raise ValueError("Only timedelta64[ns] dtype is valid.", dtype) |
Uh oh!
There was an error while loading. Please reload this page.