Skip to content

REF: de-duplicate groupby_helper code #28934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 16, 2019
Merged

Conversation

jbrockmendel
Copy link
Member

There's one other piece of de-duplication I think should be feasible but cython is still raising compilation errors for, so will do separately.

Orthogonal to #28931, but I expect it will cause merge conflicts. 28931 should be a higher priority.

@@ -19,6 +19,18 @@ ctypedef fused rank_t:
object


cdef inline bint _treat_as_na(rank_t val, bint is_datetimelike) nogil:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only applicable in groupby or should it go in util?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only here. As I mention in comments below, I'm not wild about the fact that is_datetimelike is effectively hard-coded to True in several places. Will try to see if that causes problems in follow-up(s)

@@ -19,6 +19,18 @@ ctypedef fused rank_t:
object


cdef inline bint _treat_as_na(rank_t val, bint is_datetimelike) nogil:
if rank_t is object:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat counter-intuitive that this works - out of curiosity what was the complaint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the val != val check requires calling val.__ne__ which requires the gil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn’t need the gil if these are c level object (eg ints or floats)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we just need to exclude the object case from getting to the val == val step, thats all this check is doing

if val == val:
nobs[lab, j] += 1
resx[lab, j] = val
if val == val:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this totally equivalent to what's in place? Looks like we are losing the NPY_NAT check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the rank_t is object case where we cant use _treat_as_na. (comment just below here)

cdef inline bint _treat_as_na(rank_t val, bint is_datetimelike) nogil:
if rank_t is object:
# Should never be used, but we need to avoid the `val != val` below
# or else cython will raise about gil acquisition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty odd, why dont you return an enum here (true, false, raise) and handle in code appropriately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the vicissitudes of cython fused types

@jreback jreback added this to the 1.0 milestone Oct 16, 2019
@jreback
Copy link
Contributor

jreback commented Oct 16, 2019

lgtm any perf implications?

@jbrockmendel
Copy link
Member Author

Rebased + green.

@WillAyd WillAyd merged commit bff90a3 into pandas-dev:master Oct 16, 2019
@WillAyd
Copy link
Member

WillAyd commented Oct 16, 2019

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the gbuint branch October 16, 2019 19:15
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
bongolegend pushed a commit to bongolegend/pandas that referenced this pull request Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants