-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: DataFrame.groupby(., dropna=True, axis=0) incorrectly throws ShapeError #35751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0484244
a335744
640ec38
099e30c
8a13d06
394feb6
e1cafd4
0df329c
77e7fc7
16544ea
46e5f66
6fca785
6341d97
269516b
6819ac6
2ec491d
249fc2a
b1cafad
bef1437
9abc8c4
c63a24c
9791e1e
21a6fbb
8afb6e2
239e16a
0cdea22
ee73640
c07df76
ca2f898
e15df1a
342540f
90e687b
a10a933
bddfa81
2adba09
531414f
10ee18a
2fcfda0
1969bc4
2972ee4
62caeb6
557903f
deb1b09
3d579c5
ec85d7f
f6a9724
bfe6cde
b6fd41c
983bb8e
1884133
f207709
4422a21
daa60a6
5658c12
4326b79
e12e8d9
9e6a130
1770cc2
6a005dc
74dbe4f
0e9db9c
8be535c
91940c7
85d2165
c31b49f
21bfc82
7f67086
5555585
5ac7fbf
1e7ab91
96b5af4
9cf9e05
f5a1635
bd1abf9
a789b6a
2ba8b44
95b86ba
7881134
15aa56e
08f0abd
8ab9baa
faf6570
7bd2a9a
24bb112
4377b63
de86144
9bc9ce4
1ea9d29
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -729,14 +729,28 @@ def _set_result_index_ordered( | |
# set the result index on the passed values object and | ||
# return the new object, xref 8046 | ||
|
||
# the values/counts are repeated according to the group index | ||
# shortcut if we have an already ordered grouper | ||
if not self.grouper.is_monotonic: | ||
index = Index(np.concatenate(self._get_indices(self.grouper.result_index))) | ||
result.set_axis(index, axis=self.axis, inplace=True) | ||
result = result.sort_index(axis=self.axis) | ||
|
||
result.set_axis(self.obj._get_axis(self.axis), axis=self.axis, inplace=True) | ||
if self.grouper.is_monotonic: | ||
# shortcut if we have an already ordered grouper | ||
result.set_axis(self.obj._get_axis(self.axis), axis=self.axis, inplace=True) | ||
return result | ||
|
||
# row order is scrambled => sort the rows by position in original index | ||
original_positions = Index( | ||
np.concatenate(self._get_indices(self.grouper.result_index)) | ||
) | ||
result.set_axis(original_positions, axis=self.axis, inplace=True) | ||
result = result.sort_index(axis=self.axis) | ||
|
||
dropped_rows = len(result.index) < len(self.obj.index) | ||
|
||
if dropped_rows: | ||
# get index by slicing original index according to original positions | ||
# slice drops attrs => use set_axis when no rows were dropped | ||
sorted_indexer = result.index | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we just always use this formulation? e.g. just use the if clause regardless here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well I mean for axis==0; also I think this is incorrect for axis==1 (though it likley won't trigger the if clause). can you refactor to do this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We can but the slice will drop some attributes (for example frequency for a datetime index - we have testcase which won't quite roundtrip if we do this). Will change as long as you're ok with that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Right because for FTR we don't fully support/test the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
On second thought I don't think we can AFAIK if we don't drop rows then we keep the original index which isn't an indexer into the original object's index. When we drop rows we reindex with an integer index (and that's a valid argument to |
||
result.index = self._selected_obj.index[sorted_indexer] | ||
else: | ||
result.set_axis(self.obj._get_axis(self.axis), axis=self.axis, inplace=True) | ||
arw2019 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
return result | ||
|
||
@final | ||
|
Uh oh!
There was an error while loading. Please reload this page.