-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
WIP: NaTD #24645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: NaTD #24645
Conversation
@@ -442,7 +443,7 @@ def __truediv__(self, other): | |||
|
|||
if isinstance(other, (timedelta, np.timedelta64, Tick)): | |||
other = Timedelta(other) | |||
if other is NaT: | |||
if other is NaT: # TODO: use NaTD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once we actually do this (which is easy to implement, but I didn't because we don't have test coverage yet) then a bunch of this code can be templated/de-duplicated.
Codecov Report
@@ Coverage Diff @@
## master #24645 +/- ##
==========================================
+ Coverage 92.37% 92.38% +<.01%
==========================================
Files 166 166
Lines 52379 52381 +2
==========================================
+ Hits 48387 48392 +5
+ Misses 3992 3989 -3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24645 +/- ##
==========================================
+ Coverage 92.37% 92.38% +<.01%
==========================================
Files 166 166
Lines 52384 52386 +2
==========================================
+ Hits 48390 48396 +6
+ Misses 3994 3990 -4
Continue to review full report at Codecov.
|
-1 on this. I would rather have a purpose-built missing value object that we can use in place of NaT, np.nan; this is very tricky though with the current implementation. If everything is EA's then its possible. Would rather you just patch NaT if you need. |
NaT not knowing whether it is a datetime or a timedelta is part of the problem; overloading it further would make things worse. The alternative is to define these methods as regular functions
and call the appropriate one in the appropriate place. By making them into methods we let python figure out which one to call and de-duplicate a lot of code elsewhere. |
maybe that's better. I agree this is a thorny problem. But we need a comprehensive solution, not more missing data types. |
Another option (that involves significantly less code) would be to subclass |
how would you reconcile an external pd.NaT with NaTD? is there a way to internally make a NaT_datetime and NaT_timedelta objects so that we always unbox the external object (pd.NaT) for all datetimelike ops ? |
For the foreseeable future (i.e. until pandas2) I wouldn't expose NaTD to users at all. So if a user passes pd.NaT, it behaves exactly like it does now.
The "always" part of that is hard. Or more specifically, it is hard to do that without risking returning NaTD to users. The approach in this PR is to identify all the places internally where we would/should use NaTD and swap it in where appropriate. |
closing. I think we need a comprehensive soln for this. |
There are a bunch of places where we do something like:
But in the case where we start with
np.timedelta64('NaT')
, we end up withNaT
which is datetime-like instead of timedelta-like. In some of the places where this occurs, we check for this case and special-case it. In others we miss it completely.I am not proposing to change the behavior of
Timedelta
or makeNaTD
public. The idea is that since we need the arithmetic/comparison methods anyway, we might as well put them into one place and handle them systematically.Tests are a mess at the moment.