-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: Implement groupby idxmax/idxmin in Cython #54234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5acec90
60407c5
73927a7
6958c95
f1b9ad1
ec6a33b
cede8b4
03f30f6
9e4aabc
3e3f14b
fa941b2
bfc9cc9
212247b
bb65902
590b6d4
929d5e9
8043f29
d5f328b
056c590
c0c79ef
b2641a4
cbcca6d
03d4510
6e727b4
b56d49f
51432ec
6a4f51e
1268eac
448581a
f9bb55e
ff32210
57d7b81
75bde4b
587a054
f1d2b5c
00e4347
6658a98
1539925
dadd01e
9d7d082
0bfd131
0d9d54c
363212d
52a3413
30bc4c7
95f35a4
97a52f8
ff00e20
df282d8
3d2d8a0
ad07653
b07e9ba
da8088c
38b3f38
5c416fe
75638a5
b666563
a8a5412
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -68,7 +68,10 @@ | |
deprecate_nonkeyword_arguments, | ||
doc, | ||
) | ||
from pandas.util._exceptions import find_stack_level | ||
from pandas.util._exceptions import ( | ||
find_stack_level, | ||
rewrite_warning, | ||
) | ||
from pandas.util._validators import ( | ||
validate_ascending, | ||
validate_bool_kwarg, | ||
|
@@ -11371,7 +11374,20 @@ def _get_data() -> DataFrame: | |
row_index = np.tile(np.arange(nrows), ncols) | ||
col_index = np.repeat(np.arange(ncols), nrows) | ||
ser = Series(arr, index=col_index, copy=False) | ||
result = ser.groupby(row_index).agg(name, **kwds) | ||
# GroupBy will raise a warning with SeriesGroupBy as the object, | ||
# likely confusing users | ||
with rewrite_warning( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this context expensive? if so might make sense to only do it in idxmin/idxmax instead of in _reduce? (though this is only for axis=1 so NBD?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems to be very cheap:
In any case, if we were to try to move this any lower I think we'd need to detect who the caller is? We only want to replace this warning when groupby is called from _reduce. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah - I see, use e.g. a nullcontext for other ops. I think this is cheap enough but can implement if you'd prefer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what i had in mind was not having any context here but instead in |
||
target_message=( | ||
f"The behavior of SeriesGroupBy.{name} with all-NA values" | ||
), | ||
target_category=FutureWarning, | ||
new_message=( | ||
f"The behavior of {type(self).__name__}.{name} with all-NA " | ||
"values, or any-NA and skipna=False, is deprecated. In " | ||
"a future version this will raise ValueError" | ||
), | ||
): | ||
result = ser.groupby(row_index).agg(name, **kwds) | ||
result.index = df.index | ||
if not skipna and name not in ("any", "all"): | ||
mask = df.isna().to_numpy(dtype=np.bool_).any(axis=1) | ||
|
Uh oh!
There was an error while loading. Please reload this page.