Skip to content

CLN: Refactor series datetime arithmetic #4613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 27, 2013

Conversation

jtratner
Copy link
Contributor

This is a prelude to the larger refactoring of arithmetic into core/ops,
etc. It doesn't change any external behavior (and I'm relatively confident that
the test suite covers all the branches).

The complexity of the time arithmetic code was making it hard to tweak,
so I decided to change it up. It's not that complicated, but wanted to
run it by you all in an easy-to-read diff before I change its files.

Specifically it:

  1. Moves Series time arithmetic out of _arith_method into a encapsulated _TimeOp class.
  2. Consolidates the previously repeated type conversions into one.
  3. Separates validations from type conversions, which makes the function
    easier to follow and write.
  4. Changes all obscure character codes to clear ones.
  5. Removes the unnecessary set() calls in convert_to_array.

@jtratner
Copy link
Contributor Author

@cpcloud it probably should be a style convention going forward to avoid using character codes in favor of the clearer, more verbose names.

For example, this previously had two separate conditional branches that did:

if cond1:
    lvalues.astype('i8')
    rvalues.astype('i8')
elif cond2:
    lvalues.astype(np.int64)
    rvalues.astype(np.int64)

Written like the above, it's harder to see they do the same thing and can be combined.

I'd also argue that dtypes like m8[ns] and M8[ns] should be written as timedelta64[ns]and datetime64[ns] respectively. Much clearer that way.

@jreback
Copy link
Contributor

jreback commented Aug 20, 2013

would it make sense to have a class hold the self and other then have the functions as methods of the class? (maybe in core/ops)

something like

t = _TimeOps(self, other)
t.get_result()

this patter is pretty common
eg concat, groupby etc

makes things into nice packages
rther than just collectives of functions

@jreback
Copy link
Contributor

jreback commented Aug 20, 2013

agree about dtype codes (though i8 is very common)

@cpcloud
Copy link
Member

cpcloud commented Aug 20, 2013

i really don't like that m8[unit] business so +1 on the type code stuff

@jtratner
Copy link
Contributor Author

@jreback that's a good idea - it was definitely getting awkward to pass everything around like that.

on type codes, it's not that it's wrong, it's just confusing for those who aren't in the know (or are skimming) 😄

@cpcloud
Copy link
Member

cpcloud commented Aug 20, 2013

OTOH it's extremely cumbersome to type out datetime64[ns] all the time (esp. in ipython)....hopefully pandas takes a lot of the pain away of having to manually cast things like that 😄 however a simple td64[ns] or dt64[ns] would have sufficed. Wonder where the m comes from...

@cpcloud
Copy link
Member

cpcloud commented Aug 20, 2013

let me not derail. i'm also +1 on the _TimeOps encapsulation.

god forbid we started using the ? type code (bool). kind of weird. i guess it's a tip of the hat to its C roots a la a == b ? a : b

@jtratner
Copy link
Contributor Author

I'm slightly mystified by the test failure here - I assume that it's a real failure, because it's failing on the same test (and no output should change from current master). Maybe I introduced a type mismatch? I also don't get why 2.7 and 3.3 would pass but not 2.6 or 3.2.

@cpcloud
Copy link
Member

cpcloud commented Aug 20, 2013

sounds like a numpy 1.6 issue...2.6 and 3.2 both have np 1.6.1 and 1.6.2 respectively while 2.7 and 3.3 have 1.7.1

@jtratner
Copy link
Contributor Author

whelp I'll figure it out...

@jtratner
Copy link
Contributor Author

@jreback @cpcloud finally got this all cleaned up, working and refactored into a class. I'll squash the commits down into 1 big commit after any comments on this are addressed. (much easier to tweak with the split out commits). Plus, the Series arithmetic method is now much cleaner.

After this is finished, will be much easier to do the ops refactor.

@jtratner
Copy link
Contributor Author

okay, I squashed this such that it's ready to go - obviously this will later be moved into core/ops, but, as I said, I wanted to do this separately because it's more complicated than the other refactoring.

* clarify/simplify time arithmetic-specific code into a separate class
* cleanup extraneous `set` and containment checks in branches
* refactor validation checks to separate function
* combine branches and dry out repetitive code
* move convert_to_array out of arith method
* self --> left; other --> right [possibly clearer for future refacotr]
* change handling of DateTimeIndex (no longer need additional return)
* add np1.6 check for integer --> timedelta conversion under the hood
@jreback
Copy link
Contributor

jreback commented Aug 27, 2013

@jtratner looks good to me....your first merge (that you do yourself)!

@jreback
Copy link
Contributor

jreback commented Aug 27, 2013

maye should create a core/types.py where
all type stuff is consolidated ? (pretty much in common.py now)

jtratner added a commit that referenced this pull request Aug 27, 2013
CLN: Refactor series datetime arithmetic
@jtratner jtratner merged commit c472099 into pandas-dev:master Aug 27, 2013
@jtratner
Copy link
Contributor Author

Yeah, that might make sense - I'll keep that in mind. Now on to finishing up the arithmetic refactoring!

@jtratner jtratner deleted the refactor-datetimes branch August 28, 2013 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants