Skip to content

BUG: TimedeltaIndex.intersection #17433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

kirkhansen
Copy link
Contributor

@codecov
Copy link

codecov bot commented Sep 5, 2017

Codecov Report

Merging #17433 into master will decrease coverage by 0.02%.
The diff coverage is 84.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17433      +/-   ##
==========================================
- Coverage   91.16%   91.13%   -0.03%     
==========================================
  Files         163      163              
  Lines       49581    49603      +22     
==========================================
+ Hits        45199    45208       +9     
- Misses       4382     4395      +13
Flag Coverage Δ
#multiple 88.92% <84.37%> (-0.01%) ⬇️
#single 40.24% <12.5%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/timedeltas.py 90.22% <84.37%> (-0.37%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c2d0481...98bbc51. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 5, 2017

Codecov Report

Merging #17433 into master will decrease coverage by 0.05%.
The diff coverage is 80.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17433      +/-   ##
==========================================
- Coverage   91.18%   91.13%   -0.06%     
==========================================
  Files         163      163              
  Lines       49545    49555      +10     
==========================================
- Hits        45179    45163      -16     
- Misses       4366     4392      +26
Flag Coverage Δ
#multiple 88.92% <80.88%> (-0.04%) ⬇️
#single 40.26% <55.88%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/timedeltas.py 91.17% <ø> (+0.58%) ⬆️
pandas/core/indexes/datetimes.py 95.68% <ø> (+0.15%) ⬆️
pandas/core/indexes/base.py 96.28% <100%> (ø) ⬆️
pandas/core/indexes/period.py 92.74% <100%> (ø) ⬆️
pandas/core/indexes/datetimelike.py 94.75% <80%> (-2.14%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.23% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️
pandas/core/categorical.py 95.51% <0%> (-0.02%) ⬇️
pandas/core/base.py 96.01% <0%> (ø) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa557f7...918276c. Read the comment docs.

@sinhrks sinhrks added Bug Timedelta Timedelta data type labels Sep 5, 2017
@sinhrks
Copy link
Member

sinhrks commented Sep 5, 2017

Thanks for the PR. It should have the similar logic as DatetimeIndex (may better to move to datetimelike.py).
https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/datetimes.py#L1217

Also, PeriodIndex doesn't have the efficient logic ATM, but should be the same.

def test_intersection_bug_17391():
idx1 = pd.to_timedelta(range(3), unit='s')
idx2 = pd.to_timedelta(range(2, -1, -1), unit='s')
print(idx1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove.



def test_intersection_bug_17391():
idx1 = pd.to_timedelta(range(3), unit='s')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u also add tests with not-sorted index?

@@ -443,6 +443,15 @@ def f(x):
result = result.astype('int64')
return result

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use is_monotonic_ascending

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you may want the (private) _is_strictly_monotonic_increasing property.

@@ -596,6 +605,38 @@ def _wrap_union_result(self, other, result):
name = self.name if self.name == other.name else None
return self._simple_new(result, name=name, freq=None)

def _slice_by_value(self, start, end):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these are just used once. pls put them in-line to the function.

@@ -607,7 +648,8 @@ def intersection(self, other):

Returns
-------
y : Index or TimedeltaIndex
Index or TimedeltaIndex
A shallow copied intersection between the two things passed in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index



def test_intersection_bug_17391():
idx1 = pd.to_timedelta(range(3), unit='s')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the gh issue number as a comment.

@@ -74,3 +74,66 @@ def test_intersection_bug_1708(self):
result = index_1 & index_2
expected = timedelta_range('1 day 01:00:00', periods=3, freq='h')
tm.assert_index_equal(result, expected)


def test_intersection_bug_17391():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls just name this test_intersection; this bug* scheme is old and not used anymore (you can fix the other one as well). the 'main' intersection tests should be in a test named test_intersection.

idx2 = pd.to_timedelta(range(2, -1, -1), unit='s')
print(idx1)
print(idx2)
assert len(idx1.intersection(idx2)) == 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't compare like this. construct the expected index as ordering is important here. use tm.assert_index_equal

idx1 = pd.to_timedelta(range(2, 6), unit='s')
idx2 = pd.to_timedelta(range(3), unit='s')
intersection = idx1.intersection(idx2)
assert intersection.equals(TimedeltaIndex(['00:00:02']))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a pattern of

result = ....
expected = ...
tm.assert_index_equals(result, expected)

idx1 = pd.to_timedelta(range(6, 4, -1), unit='s')
idx2 = pd.to_timedelta(range(4, 1, -1), unit='s')
intersection = idx1.intersection(idx2)
print(idx1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove all prints

@kirkhansen kirkhansen force-pushed the fix-descending-timedeltaindex-intersect branch 5 times, most recently from fd3434a to 1eb8918 Compare September 13, 2017 00:49
@pep8speaks
Copy link

pep8speaks commented Sep 13, 2017

Hello @kirkhansen! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 14, 2017 at 02:39 Hours UTC

@@ -1201,6 +1201,12 @@ def is_monotonic(self):
return self.is_monotonic_increasing

@property
def is_strictly_monotonic(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this private (lead with the underscore)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some background: when we added _is_strictly_monotonic_increasing, we went back and forth on whether or not to make it public. For now we're making it internal only until people actually ask for it.

return type(self)(data=[], **empty_params)
return left._slice_by_value(start, end)

def _offsets_equal(self, other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be here and is pretty clunky, if you can't find something useful in pandas.tseries.offsets, then make a function there. but is there a reason you cannot do

getattr(self, 'offset, None) != getattr(other, 'offset', None)?

you would have to demonstrate how the .isAnchored actually matters here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like 3 tests fail when doing just getattr(self, 'offset', None) != getattr(self, 'offset', None) for slightly different reasons.

  • pandas/tests/indexes/datetimes/test_setops.py:87 TestDatetimeIndex.test_intersection : result.freq != expected.freq
  • pandas/tests/indexes/datetimes/test_setops.py:151 TestDatetimeIndex.test_intersection_bug_1708: assert 3 == 0
  • pandas/tests/indexes/datetimes/test_setops.py:283 TestBusinessDatetimeIndex.test_intersection: assert None == <Minute>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests that run locally seem to be fine with isAnchored being gone.

result.offset = frequencies.to_offset(result.inferred_freq)
return result

lenghts = len(self), len(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lengths

# handle intersecting things like this
# idx1 = pd.to_timedelta((1, 2, 3, 4, 5, 6, 7, 8), unit='s')
# idx2 = pd.to_timedelta((2, 3, 4, 8), unit='s')
if lenghts[0] != lenghts[1] and (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this special case?

Copy link
Contributor Author

@kirkhansen kirkhansen Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way that fast indexing works will produce a wrong result when running an intersection with those two examples. The result will be (2, 3, 4, 5, 6, 7, 8) instead of (2, 3, 4, 8).

Interestingly enough, when this case is gone, the reason I added the assert kind in ['ix', 'loc', 'getitem', None] shows up as an error in one of the tests.

  • pandas/tests/indexes/period/test_setops.py:133 TestPeriodIndex.test_intersection: AssertionError
  • pandas/tests/indexes/period/test_setops.py:155 TestPeriodIndex.test_intersection_cases: AssertionError

I added None as None is in both the base _maybe_cast_slice_bound and the timedeltas _maybe_cast_slice_bound.

if self_ascending != other.is_monotonic_increasing:
other = other.sort_values(ascending=self_ascending)

# Thanks, PeriodIndex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh?

Copy link
Contributor Author

@kirkhansen kirkhansen Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go away now with the changes to intersect_ascending and intersect_descending not needing to create an empty index of type(self).

end = right[-1]

if end > start:
return type(self)(data=[], **empty_params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very clunky. again just return an indexer (or values are ok), and wrap at a higher level, then you don't need all of these params.

else:
intersected = self._intersect_descending(other, **empty_params)

name = self.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_consensus_name

@@ -843,7 +843,7 @@ def _maybe_cast_slice_bound(self, label, side, kind):
Value of `side` parameter should be validated in caller.

"""
assert kind in ['ix', 'loc', 'getitem']
assert kind in ['ix', 'loc', 'getitem', None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh?

@@ -74,3 +75,99 @@ def test_intersection_bug_1708(self):
result = index_1 & index_2
expected = timedelta_range('1 day 01:00:00', periods=3, freq='h')
tm.assert_index_equal(result, expected)


def test_intersection_intersects_ascending():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parametrize this

assert result.equals(TimedeltaIndex(['00:00:02']))


def test_intersection_intersects_descending():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parametrize

@kirkhansen
Copy link
Contributor Author

@jreback most of the logic here is from a merge between the old timedeltas.py intersection, and the datetimes.py intersection. I assumed the logic in both of these were there for good reason. I'm fine with cleaning those types of things up per your merge request, but just know it was mostly copy paste, and I took the most restrictive version of each piece between the intersection functions I could find without fully understanding their reason for being there.

@jreback
Copy link
Contributor

jreback commented Sep 13, 2017

@jreback most of the logic here is from a merge between the old timedeltas.py intersection, and the datetimes.py intersection. I assumed the logic in both of these were there for good reason. I'm fine with cleaning those types of things up per your merge request, but just know it was mostly copy paste, and I took the most restrictive version of each piece between the intersection functions I could find without fully understanding their reason for being there.

@kirkhansen yep I see that. When we refactor always like to clean up what is possible. For sure nice job on combining the disparate sub-class routines.

@kirkhansen kirkhansen force-pushed the fix-descending-timedeltaindex-intersect branch from 02c4bc6 to 918276c Compare September 14, 2017 02:39
@TomAugspurger
Copy link
Contributor

@kirkhansen some tests failures in pandas/tests/io/test_pytables.py::TestHDFStore, that I think are relevant Can you take a look (doing a release at the end of the week).

@kirkhansen
Copy link
Contributor Author

some tests failures in pandas/tests/io/test_pytables.py::TestHDFStore, that I think are relevant Can you take a look (doing a release at the end of the week).

@TomAugspurger I haven't had much time for coding as of late. I haven't been able to reproduce this error locally (all the tests pass with my versions of things), but I'll try and dig into this tonight or tomorrow.

@TomAugspurger
Copy link
Contributor

K. I just pulled your branch and wasn't able to reproduce either. I should have some time tomorrow to debug if you're busy.

@kirkhansen
Copy link
Contributor Author

@TomAugspurger I hate to say it, but if you're attempting to release at the end of the week, you may want to pick this up. I was unable to spend time on it yesterday and doubt I'll be able to look at this tonight.

@jreback
Copy link
Contributor

jreback commented Nov 10, 2017

can you rebase / update

@jreback
Copy link
Contributor

jreback commented Dec 10, 2017

closing as stale, but if you'd like to keep working, ping and we can reopen

@jreback jreback closed this Dec 10, 2017
@kirkhansen
Copy link
Contributor Author

@jreback I've been poking around at this again, getting stuff back up to date in hopes of resolving the build failure.

@kirkhansen
Copy link
Contributor Author

FWIW, the pytables tests pass now, but i get errors with LooseVersion. Those same errors show up in the master branch of my builds, so I'm not sure what its problem is since it's a copy of your master, and your master is passing.

@TomAugspurger
Copy link
Contributor

FYI @kirkhansen you'll need to open a new PR. Github doesn't support closing a PR, bushing to the branch, and then reopening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: TimedeltaIndex.intersection fails for decreasing TimedeltaIndex
5 participants