-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Groupby ops on empty objects loses index, columns, dtypes #39940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rhshadrach
commented
Feb 20, 2021
- closes Grouping, then resampling empty DataFrame leads to it losing column names and index #26411
- closes Dataframe pivot_table() returning a multi-index for a single value, empty dataframe #13483
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
…_agg_empty_columns � Conflicts: � doc/source/whatsnew/v1.3.0.rst
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -435,6 +435,7 @@ Groupby/resample/rolling | |||
- Bug in :meth:`core.window.rolling.RollingGroupby.corr` and :meth:`core.window.expanding.ExpandingGroupby.corr` where the groupby column would return 0 instead of ``np.nan`` when providing ``other`` that was longer than each group (:issue:`39591`) | |||
- Bug in :meth:`core.window.expanding.ExpandingGroupby.corr` and :meth:`core.window.expanding.ExpandingGroupby.cov` where 1 would be returned instead of ``np.nan`` when providing ``other`` that was longer than each group (:issue:`39591`) | |||
- Bug in :meth:`.GroupBy.mean`, :meth:`.GroupBy.median` and :meth:`DataFrame.pivot_table` not propagating metadata (:issue:`28283`) | |||
- Bug in various Groupby operations on an empty ``Series`` or ``DataFrame`` would lose index, columns, and data types (:issue:`26411`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you be more specific here (which groupby ops)
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -435,6 +435,7 @@ Groupby/resample/rolling | |||
- Bug in :meth:`core.window.rolling.RollingGroupby.corr` and :meth:`core.window.expanding.ExpandingGroupby.corr` where the groupby column would return 0 instead of ``np.nan`` when providing ``other`` that was longer than each group (:issue:`39591`) | |||
- Bug in :meth:`core.window.expanding.ExpandingGroupby.corr` and :meth:`core.window.expanding.ExpandingGroupby.cov` where 1 would be returned instead of ``np.nan`` when providing ``other`` that was longer than each group (:issue:`39591`) | |||
- Bug in :meth:`.GroupBy.mean`, :meth:`.GroupBy.median` and :meth:`DataFrame.pivot_table` not propagating metadata (:issue:`28283`) | |||
- Bug in various Groupby operations on an empty ``Series`` or ``DataFrame`` would lose index, columns, and data types (:issue:`26411`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this affects an example resample right? (from the OP). pls indicate that as wel.
"float", | ||
{"A": "object", "B": "int", "C": "float"}, | ||
{"A": "int", "B": "float", "C": "object"}, | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some other datatypes to make sure preserving (categorical, datetime, datetime w/tz, Int). if some still don't work, just xfail them and create an issue.)
…_agg_empty_columns
…_agg_empty_columns
@jreback - thanks, whatsnew and test have been updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment about the astype (the test suggestion is for followup). ping on greenish
pandas/core/groupby/generic.py
Outdated
result = self.obj._constructor( | ||
index=self.grouper.result_index, columns=data.columns | ||
) | ||
result = result.astype(data.dtypes.to_dict()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suppose could add copy=False
here
gb = df.groupby(keys)[columns] | ||
if method == "attr": | ||
result = getattr(gb, op)() | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OT: might be worthile to split up this file as getting kind of long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense - TBH I've never fully understood what tests were meant to be in here. I've always thought of it as tests of the *GroupBy attributes themselves, rather than the computation methods (e.g. sum, apply, etc). If that's the case, then maybe just move any tests that rely on calling computation methods out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right the test_groupby.py is basically test that we correctly construct a groupby object and other tests are about actually executing it. over the years these have slowly been separated out. i think time to rename this and be clear about it.
…_agg_empty_columns � Conflicts: � doc/source/whatsnew/v1.3.0.rst