-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: TimedeltaIndex.intersection #17433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: TimedeltaIndex.intersection #17433
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17433 +/- ##
==========================================
- Coverage 91.16% 91.13% -0.03%
==========================================
Files 163 163
Lines 49581 49603 +22
==========================================
+ Hits 45199 45208 +9
- Misses 4382 4395 +13
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17433 +/- ##
==========================================
- Coverage 91.18% 91.13% -0.06%
==========================================
Files 163 163
Lines 49545 49555 +10
==========================================
- Hits 45179 45163 -16
- Misses 4366 4392 +26
Continue to review full report at Codecov.
|
Thanks for the PR. It should have the similar logic as Also, |
def test_intersection_bug_17391(): | ||
idx1 = pd.to_timedelta(range(3), unit='s') | ||
idx2 = pd.to_timedelta(range(2, -1, -1), unit='s') | ||
print(idx1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove.
|
||
|
||
def test_intersection_bug_17391(): | ||
idx1 = pd.to_timedelta(range(3), unit='s') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u also add tests with not-sorted index?
pandas/core/indexes/timedeltas.py
Outdated
@@ -443,6 +443,15 @@ def f(x): | |||
result = result.astype('int64') | |||
return result | |||
|
|||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use is_monotonic_ascending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you may want the (private) _is_strictly_monotonic_increasing
property.
pandas/core/indexes/timedeltas.py
Outdated
@@ -596,6 +605,38 @@ def _wrap_union_result(self, other, result): | |||
name = self.name if self.name == other.name else None | |||
return self._simple_new(result, name=name, freq=None) | |||
|
|||
def _slice_by_value(self, start, end): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of these are just used once. pls put them in-line to the function.
pandas/core/indexes/timedeltas.py
Outdated
@@ -607,7 +648,8 @@ def intersection(self, other): | |||
|
|||
Returns | |||
------- | |||
y : Index or TimedeltaIndex | |||
Index or TimedeltaIndex | |||
A shallow copied intersection between the two things passed in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index
|
||
|
||
def test_intersection_bug_17391(): | ||
idx1 = pd.to_timedelta(range(3), unit='s') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the gh issue number as a comment.
@@ -74,3 +74,66 @@ def test_intersection_bug_1708(self): | |||
result = index_1 & index_2 | |||
expected = timedelta_range('1 day 01:00:00', periods=3, freq='h') | |||
tm.assert_index_equal(result, expected) | |||
|
|||
|
|||
def test_intersection_bug_17391(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls just name this test_intersection; this bug* scheme is old and not used anymore (you can fix the other one as well). the 'main' intersection tests should be in a test named test_intersection.
idx2 = pd.to_timedelta(range(2, -1, -1), unit='s') | ||
print(idx1) | ||
print(idx2) | ||
assert len(idx1.intersection(idx2)) == 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't compare like this. construct the expected index as ordering is important here. use tm.assert_index_equal
idx1 = pd.to_timedelta(range(2, 6), unit='s') | ||
idx2 = pd.to_timedelta(range(3), unit='s') | ||
intersection = idx1.intersection(idx2) | ||
assert intersection.equals(TimedeltaIndex(['00:00:02'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a pattern of
result = ....
expected = ...
tm.assert_index_equals(result, expected)
idx1 = pd.to_timedelta(range(6, 4, -1), unit='s') | ||
idx2 = pd.to_timedelta(range(4, 1, -1), unit='s') | ||
intersection = idx1.intersection(idx2) | ||
print(idx1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove all prints
fd3434a
to
1eb8918
Compare
Hello @kirkhansen! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on September 14, 2017 at 02:39 Hours UTC |
pandas/core/indexes/base.py
Outdated
@@ -1201,6 +1201,12 @@ def is_monotonic(self): | |||
return self.is_monotonic_increasing | |||
|
|||
@property | |||
def is_strictly_monotonic(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this private (lead with the underscore)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some background: when we added _is_strictly_monotonic_increasing
, we went back and forth on whether or not to make it public. For now we're making it internal only until people actually ask for it.
pandas/core/indexes/datetimelike.py
Outdated
return type(self)(data=[], **empty_params) | ||
return left._slice_by_value(start, end) | ||
|
||
def _offsets_equal(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be here and is pretty clunky, if you can't find something useful in pandas.tseries.offsets
, then make a function there. but is there a reason you cannot do
getattr(self, 'offset, None) != getattr(other, 'offset', None)
?
you would have to demonstrate how the .isAnchored
actually matters here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like 3 tests fail when doing just getattr(self, 'offset', None) != getattr(self, 'offset', None)
for slightly different reasons.
- pandas/tests/indexes/datetimes/test_setops.py:87 TestDatetimeIndex.test_intersection :
result.freq != expected.freq
- pandas/tests/indexes/datetimes/test_setops.py:151 TestDatetimeIndex.test_intersection_bug_1708:
assert 3 == 0
- pandas/tests/indexes/datetimes/test_setops.py:283 TestBusinessDatetimeIndex.test_intersection:
assert None == <Minute>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests that run locally seem to be fine with isAnchored
being gone.
pandas/core/indexes/datetimelike.py
Outdated
result.offset = frequencies.to_offset(result.inferred_freq) | ||
return result | ||
|
||
lenghts = len(self), len(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lengths
pandas/core/indexes/datetimelike.py
Outdated
# handle intersecting things like this | ||
# idx1 = pd.to_timedelta((1, 2, 3, 4, 5, 6, 7, 8), unit='s') | ||
# idx2 = pd.to_timedelta((2, 3, 4, 8), unit='s') | ||
if lenghts[0] != lenghts[1] and ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need this special case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way that fast indexing works will produce a wrong result when running an intersection with those two examples. The result will be (2, 3, 4, 5, 6, 7, 8) instead of (2, 3, 4, 8).
Interestingly enough, when this case is gone, the reason I added the assert kind in ['ix', 'loc', 'getitem', None]
shows up as an error in one of the tests.
- pandas/tests/indexes/period/test_setops.py:133 TestPeriodIndex.test_intersection: AssertionError
- pandas/tests/indexes/period/test_setops.py:155 TestPeriodIndex.test_intersection_cases: AssertionError
I added None as None is in both the base _maybe_cast_slice_bound and the timedeltas _maybe_cast_slice_bound.
pandas/core/indexes/datetimelike.py
Outdated
if self_ascending != other.is_monotonic_increasing: | ||
other = other.sort_values(ascending=self_ascending) | ||
|
||
# Thanks, PeriodIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can go away now with the changes to intersect_ascending and intersect_descending not needing to create an empty index of type(self).
pandas/core/indexes/datetimelike.py
Outdated
end = right[-1] | ||
|
||
if end > start: | ||
return type(self)(data=[], **empty_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very clunky. again just return an indexer (or values are ok), and wrap at a higher level, then you don't need all of these params.
pandas/core/indexes/datetimelike.py
Outdated
else: | ||
intersected = self._intersect_descending(other, **empty_params) | ||
|
||
name = self.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_consensus_name
pandas/core/indexes/period.py
Outdated
@@ -843,7 +843,7 @@ def _maybe_cast_slice_bound(self, label, side, kind): | |||
Value of `side` parameter should be validated in caller. | |||
|
|||
""" | |||
assert kind in ['ix', 'loc', 'getitem'] | |||
assert kind in ['ix', 'loc', 'getitem', None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh?
@@ -74,3 +75,99 @@ def test_intersection_bug_1708(self): | |||
result = index_1 & index_2 | |||
expected = timedelta_range('1 day 01:00:00', periods=3, freq='h') | |||
tm.assert_index_equal(result, expected) | |||
|
|||
|
|||
def test_intersection_intersects_ascending(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parametrize this
assert result.equals(TimedeltaIndex(['00:00:02'])) | ||
|
||
|
||
def test_intersection_intersects_descending(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parametrize
@jreback most of the logic here is from a merge between the old timedeltas.py intersection, and the datetimes.py intersection. I assumed the logic in both of these were there for good reason. I'm fine with cleaning those types of things up per your merge request, but just know it was mostly copy paste, and I took the most restrictive version of each piece between the intersection functions I could find without fully understanding their reason for being there. |
@kirkhansen yep I see that. When we refactor always like to clean up what is possible. For sure nice job on combining the disparate sub-class routines. |
02c4bc6
to
918276c
Compare
@kirkhansen some tests failures in |
@TomAugspurger I haven't had much time for coding as of late. I haven't been able to reproduce this error locally (all the tests pass with my versions of things), but I'll try and dig into this tonight or tomorrow. |
K. I just pulled your branch and wasn't able to reproduce either. I should have some time tomorrow to debug if you're busy. |
@TomAugspurger I hate to say it, but if you're attempting to release at the end of the week, you may want to pick this up. I was unable to spend time on it yesterday and doubt I'll be able to look at this tonight. |
can you rebase / update |
closing as stale, but if you'd like to keep working, ping and we can reopen |
@jreback I've been poking around at this again, getting stuff back up to date in hopes of resolving the build failure. |
FWIW, the pytables tests pass now, but i get errors with |
FYI @kirkhansen you'll need to open a new PR. Github doesn't support closing a PR, bushing to the branch, and then reopening. |
git diff upstream/master -u -- "*.py" | flake8 --diff