Skip to content

BUG: TimedeltaIndex.intersection #22114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

kirkhansen
Copy link
Contributor

@kirkhansen kirkhansen commented Jul 29, 2018

@kirkhansen kirkhansen changed the title BUG: TimedeltaIndex.intersection #17433 BUG: TimedeltaIndex.intersection Jul 29, 2018
@kirkhansen kirkhansen force-pushed the fix-descending-timedeltaindex-intersect branch from 62d9e3e to 5af3c53 Compare July 29, 2018 17:08
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a whatsnew note in bug fixes / timedeltas

@@ -1412,6 +1412,12 @@ def is_monotonic(self):
""" alias for is_monotonic_increasing (deprecated) """
return self.is_monotonic_increasing

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we actually using this? (it seems you in-lined it below), I actually prefer that to making a new method

Copy link
Contributor Author

@kirkhansen kirkhansen Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names are similar, but I'm calling this function twice in the intersection function to help me deal with the boolean logic (already hard for me to follow, even with this addition). I can remove this, and put this inline if you really want it that way.

end = right[-1]

if end > start:
return Index()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not []?

start = right[0]

if end < start:
return []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth to consolidate these into 1 routine? as you can call with the ascending arg

result.offset = frequencies.to_offset(result.inferred_freq)
return result

# handle intersecting things like this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a tiny bit more of expl

@@ -112,6 +112,41 @@ def wrapper(self, other):
return wrapper


def apply_index_wraps(func):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been awhile. I don't remember writing this code. I'll guess an artifact of my rebase.

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timedelta Timedelta data type labels Jul 30, 2018
@kirkhansen
Copy link
Contributor Author

Having come back at this, I'm struggling with how to deal with all the edge cases that need to be accounted for before trying to apply this quick method of intersecting, and this is making me question whether or not to even include it.
The quick method cannot allow any 'gaps' between values, unless both indexes have the same gaps, and having to check for that before intersecting seems a lot harder than just using Index.intersection, unless that method is worse than something we can write to check if if values are contiguous based on the frequency passed in or something.

I'm having a hard time describing it clearly, but hopefully this is making sense. If not, I can commit my broken code so you all can see what I'm talking about.

@jorisvandenbossche
Copy link
Member

A possibility to at least solve the bug would be to fall back to the Index.intersection implementation for the cases where monotonicity does not match?

Basically similar like DatetimeIndex.intersection also falls back to it when eg freq does not match:

elif (other.freq is None or self.freq is None or
other.freq != self.freq or
not other.freq.isAnchored() or
(not self.is_monotonic or not other.is_monotonic)):
result = Index.intersection(self, other)
result = self._shallow_copy(result._values, name=result.name,
tz=result.tz, freq=None)
if result.freq is None:
result.freq = to_offset(result.inferred_freq)
return result

Will not be the most performant option, but at least could fix the bug

@kirkhansen
Copy link
Contributor Author

@jorisvandenbossche I thought I had been doing that, but it seems my check was backwards for that case :/. Correcting that appears to solve the issues. Thanks!

@kirkhansen kirkhansen force-pushed the fix-descending-timedeltaindex-intersect branch from 5af3c53 to 808ce4e Compare September 14, 2018 14:02
@pep8speaks
Copy link

pep8speaks commented Sep 14, 2018

Hello @kirkhansen! Thanks for updating the PR.

Comment last updated on December 12, 2018 at 21:29 Hours UTC

@kirkhansen kirkhansen force-pushed the fix-descending-timedeltaindex-intersect branch from 808ce4e to 5703681 Compare November 9, 2018 05:50
@codecov
Copy link

codecov bot commented Nov 9, 2018

Codecov Report

Merging #22114 into master will increase coverage by <.01%.
The diff coverage is 86.79%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22114      +/-   ##
==========================================
+ Coverage   92.21%   92.22%   +<.01%     
==========================================
  Files         162      162              
  Lines       51768    51761       -7     
==========================================
- Hits        47739    47734       -5     
+ Misses       4029     4027       -2
Flag Coverage Δ
#multiple 90.62% <86.79%> (ø) ⬆️
#single 43.14% <47.16%> (+0.14%) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 96.8% <ø> (+0.46%) ⬆️
pandas/core/indexes/period.py 93.06% <100%> (ø) ⬆️
pandas/tseries/offsets.py 96.86% <100%> (+0.01%) ⬆️
pandas/core/indexes/base.py 96.27% <100%> (ø) ⬆️
pandas/core/indexes/timedeltas.py 90.69% <33.33%> (+0.26%) ⬆️
pandas/core/indexes/datetimelike.py 96.24% <87.5%> (-1.06%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9bc42f9...4aac7e8. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Dec 9, 2018

if you can merge master

@kirkhansen kirkhansen force-pushed the fix-descending-timedeltaindex-intersect branch from 16965c5 to 2f5513c Compare December 12, 2018 19:57
@mroeschke
Copy link
Member

You'll need to run isort on this file:

2018-12-12T21:36:13.4930067Z ERROR: /home/vsts/work/1/s/pandas/core/indexes/datetimelike.py Imports are incorrectly sorted.

first = getattr(first, 'freq', None)
second = getattr(second, 'freq', None)
are_offsets_equal = True
if first is None or second is None or first != second:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return this statement; no need for the intermediate variable

@@ -4116,6 +4116,7 @@ def test_append_to_multiple_dropna(self):
df2 = tm.makeTimeDataFrame().rename(columns=lambda x: "%s_2" % x)
df1.iloc[1, df1.columns.get_indexer(['A', 'B'])] = np.nan
df = concat([df1, df2], axis=1)
print(df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This print should be removed.

"""
Speedy intersection that works only if certain assumptions are met.
See intersection for details.
Parameters
Copy link
Member

@mroeschke mroeschke Dec 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give a bit more color to Parameters and Returns

@WillAyd
Copy link
Member

WillAyd commented Feb 27, 2019

Closing as stale ping if you'd like to continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: TimedeltaIndex.intersection fails for decreasing TimedeltaIndex
6 participants