Skip to content

BUG: Groupby ops on empty objects loses index, columns, dtypes #39940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Feb 24, 2021

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach added Bug Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions labels Feb 20, 2021
…_agg_empty_columns

� Conflicts:
�	doc/source/whatsnew/v1.3.0.rst
@rhshadrach rhshadrach marked this pull request as draft February 21, 2021 13:46
@rhshadrach rhshadrach marked this pull request as ready for review February 21, 2021 14:07
@@ -435,6 +435,7 @@ Groupby/resample/rolling
- Bug in :meth:`core.window.rolling.RollingGroupby.corr` and :meth:`core.window.expanding.ExpandingGroupby.corr` where the groupby column would return 0 instead of ``np.nan`` when providing ``other`` that was longer than each group (:issue:`39591`)
- Bug in :meth:`core.window.expanding.ExpandingGroupby.corr` and :meth:`core.window.expanding.ExpandingGroupby.cov` where 1 would be returned instead of ``np.nan`` when providing ``other`` that was longer than each group (:issue:`39591`)
- Bug in :meth:`.GroupBy.mean`, :meth:`.GroupBy.median` and :meth:`DataFrame.pivot_table` not propagating metadata (:issue:`28283`)
- Bug in various Groupby operations on an empty ``Series`` or ``DataFrame`` would lose index, columns, and data types (:issue:`26411`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be more specific here (which groupby ops)

@@ -435,6 +435,7 @@ Groupby/resample/rolling
- Bug in :meth:`core.window.rolling.RollingGroupby.corr` and :meth:`core.window.expanding.ExpandingGroupby.corr` where the groupby column would return 0 instead of ``np.nan`` when providing ``other`` that was longer than each group (:issue:`39591`)
- Bug in :meth:`core.window.expanding.ExpandingGroupby.corr` and :meth:`core.window.expanding.ExpandingGroupby.cov` where 1 would be returned instead of ``np.nan`` when providing ``other`` that was longer than each group (:issue:`39591`)
- Bug in :meth:`.GroupBy.mean`, :meth:`.GroupBy.median` and :meth:`DataFrame.pivot_table` not propagating metadata (:issue:`28283`)
- Bug in various Groupby operations on an empty ``Series`` or ``DataFrame`` would lose index, columns, and data types (:issue:`26411`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this affects an example resample right? (from the OP). pls indicate that as wel.

"float",
{"A": "object", "B": "int", "C": "float"},
{"A": "int", "B": "float", "C": "object"},
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some other datatypes to make sure preserving (categorical, datetime, datetime w/tz, Int). if some still don't work, just xfail them and create an issue.)

@rhshadrach
Copy link
Member Author

@jreback - thanks, whatsnew and test have been updated.

@jreback jreback added this to the 1.3 milestone Feb 24, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment about the astype (the test suggestion is for followup). ping on greenish

result = self.obj._constructor(
index=self.grouper.result_index, columns=data.columns
)
result = result.astype(data.dtypes.to_dict())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suppose could add copy=False here

gb = df.groupby(keys)[columns]
if method == "attr":
result = getattr(gb, op)()
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OT: might be worthile to split up this file as getting kind of long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - TBH I've never fully understood what tests were meant to be in here. I've always thought of it as tests of the *GroupBy attributes themselves, rather than the computation methods (e.g. sum, apply, etc). If that's the case, then maybe just move any tests that rely on calling computation methods out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right the test_groupby.py is basically test that we correctly construct a groupby object and other tests are about actually executing it. over the years these have slowly been separated out. i think time to rename this and be clear about it.

…_agg_empty_columns

� Conflicts:
�	doc/source/whatsnew/v1.3.0.rst
@jreback jreback merged commit 3408a61 into pandas-dev:master Feb 24, 2021
@rhshadrach rhshadrach deleted the gb_agg_empty_columns branch February 24, 2021 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
2 participants