Skip to content

WIP: NaTD #24645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

WIP: NaTD #24645

wants to merge 5 commits into from

Conversation

jbrockmendel
Copy link
Member

There are a bunch of places where we do something like:

    def method(self, other):
        [...]
        if isinstance(other, (np.timedelta64, timedelta, Tick)):
            other = Timedelta(other)
       [...]

But in the case where we start with np.timedelta64('NaT'), we end up with NaT which is datetime-like instead of timedelta-like. In some of the places where this occurs, we check for this case and special-case it. In others we miss it completely.

I am not proposing to change the behavior of Timedelta or make NaTD public. The idea is that since we need the arithmetic/comparison methods anyway, we might as well put them into one place and handle them systematically.

Tests are a mess at the moment.

@@ -442,7 +443,7 @@ def __truediv__(self, other):

if isinstance(other, (timedelta, np.timedelta64, Tick)):
other = Timedelta(other)
if other is NaT:
if other is NaT: # TODO: use NaTD
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we actually do this (which is easy to implement, but I didn't because we don't have test coverage yet) then a bunch of this code can be templated/de-duplicated.

@codecov
Copy link

codecov bot commented Jan 5, 2019

Codecov Report

Merging #24645 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24645      +/-   ##
==========================================
+ Coverage   92.37%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52379    52381       +2     
==========================================
+ Hits        48387    48392       +5     
+ Misses       3992     3989       -3
Flag Coverage Δ
#multiple 90.8% <100%> (ø) ⬆️
#single 43.01% <10%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/timedeltas.py 88.73% <100%> (+0.54%) ⬆️
pandas/core/ops.py 94.28% <100%> (ø) ⬆️
pandas/core/indexes/base.py 96.36% <100%> (+0.06%) ⬆️
pandas/util/testing.py 88% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ebd598a...d0efc5f. Read the comment docs.

@codecov
Copy link

codecov bot commented Jan 5, 2019

Codecov Report

Merging #24645 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24645      +/-   ##
==========================================
+ Coverage   92.37%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52384    52386       +2     
==========================================
+ Hits        48390    48396       +6     
+ Misses       3994     3990       -4
Flag Coverage Δ
#multiple 90.8% <100%> (ø) ⬆️
#single 43.02% <10%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/timedeltas.py 88.64% <100%> (+0.55%) ⬆️
pandas/core/ops.py 94.28% <100%> (ø) ⬆️
pandas/core/indexes/base.py 96.36% <100%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 393d0aa...1b533ff. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Jan 5, 2019

-1 on this. I would rather have a purpose-built missing value object that we can use in place of NaT, np.nan; this is very tricky though with the current implementation. If everything is EA's then its possible.

Would rather you just patch NaT if you need.

@jbrockmendel
Copy link
Member Author

Would rather you just patch NaT if you need.

NaT not knowing whether it is a datetime or a timedelta is part of the problem; overloading it further would make things worse.

The alternative is to define these methods as regular functions

def nat_td_div(left, right):

and call the appropriate one in the appropriate place. By making them into methods we let python figure out which one to call and de-duplicate a lot of code elsewhere.

@jreback
Copy link
Contributor

jreback commented Jan 5, 2019

The alternative is to define these methods as regular functions

maybe that's better. I agree this is a thorny problem. But we need a comprehensive solution, not more missing data types.

@jbrockmendel
Copy link
Member Author

Another option (that involves significantly less code) would be to subclass Timedelta and override __new__ to allow us to get an internal NaTD object with self.value == NPY_NAT. The tweaks needed in Timedelta to catch the if self.value == NPY_NAT cases would be small compared to the current diff.

@jreback
Copy link
Contributor

jreback commented Jan 6, 2019

how would you reconcile an external pd.NaT with NaTD? is there a way to internally make a NaT_datetime and NaT_timedelta objects so that we always unbox the external object (pd.NaT) for all datetimelike ops ?

@jbrockmendel
Copy link
Member Author

how would you reconcile an external pd.NaT with NaTD?

For the foreseeable future (i.e. until pandas2) I wouldn't expose NaTD to users at all. So if a user passes pd.NaT, it behaves exactly like it does now.

is there a way to internally make a NaT_datetime and NaT_timedelta objects so that we always unbox the external object (pd.NaT) for all datetimelike ops ?

The "always" part of that is hard. Or more specifically, it is hard to do that without risking returning NaTD to users. The approach in this PR is to identify all the places internally where we would/should use NaTD and swap it in where appropriate.

@gfyoung gfyoung added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timedelta Timedelta data type labels Jan 7, 2019
@jreback
Copy link
Contributor

jreback commented Mar 20, 2019

closing. I think we need a comprehensive soln for this.

@jreback jreback closed this Mar 20, 2019
@jbrockmendel jbrockmendel deleted the tdnat branch November 20, 2021 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants