-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
REF: de-duplicate groupby_helper code #28934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -19,6 +19,18 @@ ctypedef fused rank_t: | |||
object | |||
|
|||
|
|||
cdef inline bint _treat_as_na(rank_t val, bint is_datetimelike) nogil: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only applicable in groupby or should it go in util?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only here. As I mention in comments below, I'm not wild about the fact that is_datetimelike is effectively hard-coded to True in several places. Will try to see if that causes problems in follow-up(s)
@@ -19,6 +19,18 @@ ctypedef fused rank_t: | |||
object | |||
|
|||
|
|||
cdef inline bint _treat_as_na(rank_t val, bint is_datetimelike) nogil: | |||
if rank_t is object: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat counter-intuitive that this works - out of curiosity what was the complaint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the val != val
check requires calling val.__ne__
which requires the gil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn’t need the gil if these are c level object (eg ints or floats)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we just need to exclude the object case from getting to the val == val
step, thats all this check is doing
if val == val: | ||
nobs[lab, j] += 1 | ||
resx[lab, j] = val | ||
if val == val: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this totally equivalent to what's in place? Looks like we are losing the NPY_NAT
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the rank_t is object
case where we cant use _treat_as_na. (comment just below here)
cdef inline bint _treat_as_na(rank_t val, bint is_datetimelike) nogil: | ||
if rank_t is object: | ||
# Should never be used, but we need to avoid the `val != val` below | ||
# or else cython will raise about gil acquisition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is pretty odd, why dont you return an enum here (true, false, raise) and handle in code appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the vicissitudes of cython fused types
lgtm any perf implications? |
Rebased + green. |
Thanks @jbrockmendel |
There's one other piece of de-duplication I think should be feasible but cython is still raising compilation errors for, so will do separately.
Orthogonal to #28931, but I expect it will cause merge conflicts. 28931 should be a higher priority.