Skip to content

BUG: fixed buggy behavior in pivot_table with margins=True and numeric columns #46778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -607,6 +607,7 @@ Reshaping
- Bug in :func:`get_dummies` that selected object and categorical dtypes but not string (:issue:`44965`)
- Bug in :meth:`DataFrame.align` when aligning a :class:`MultiIndex` to a :class:`Series` with another :class:`MultiIndex` (:issue:`46001`)
- Bug in concanenation with ``IntegerDtype``, or ``FloatingDtype`` arrays where the resulting dtype did not mirror the behavior of the non-nullable dtypes (:issue:`46379`)
- Bug in :func:`_generate_marginal_results` when passing in integer key values to :func:`reorder_levels` referenced level by both number and key instead of only one (:issue:`26568`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should describe the user facing impact of the change not the internal change i.e. the bug fix of pivot_table

- Bug in :func:`concat` with identical key leads to error when indexing :class:`MultiIndex` (:issue:`46519`)
- Bug in :meth:`DataFrame.join` with a list when using suffixes to join DataFrames with duplicate column names (:issue:`46396`)
-
Expand Down
12 changes: 12 additions & 0 deletions pandas/core/reshape/pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

from pandas.core.dtypes.cast import maybe_downcast_to_dtype
from pandas.core.dtypes.common import (
is_integer,
is_integer_dtype,
is_list_like,
is_nested_list_like,
Expand Down Expand Up @@ -409,6 +410,17 @@ def _all_key(key):

# slight hack
new_order = [len(cols)] + list(range(len(cols)))

# GH26568 reorder_levels improperly uses a key that is also a number
# If a number in new_order is also a key,
# convert new_order to their respective keys in row_margin.index.names
for key in row_margin.index.names:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this just be a take(new_order) in the index names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke I tried using take() to do what we're doing here but it doesn't seem like its doing the same thing. Could you elaborate on what take(new_order) should be doing here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am interpreting your for loop correctly, you're converting new_order which originally represented positions to labels? This should be equivalent to a take operation if so: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.take.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm converting positions in the MultiIndex.names attribute into its corresponding labels. However, using take on a MultiIndex (rather than Index) doesn't replicate the same functionality:

(Pdb) p row_margin.index
MultiIndex([('x', 2),
            ('y', 2),
            ('z', 2)],
           names=[1, None])
(Pdb) p row_margin.index.take([1,0])
MultiIndex([('y', 2),
            ('x', 2)],
           names=[1, None])
(Pdb) p row_margin.index.take([1,0], axis=1)
MultiIndex([('y', 2),
            ('x', 2)],
           names=[1, None])

What I want is something like this:

MultiIndex([(2, 'x'),
            (2, 'y'),
            (2, 'z')],
           names=[None, 1])

Are we thinking about it the same way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your for loop though, it appears you want to take on the names not the actual values?

if is_integer(key) and key <= len(cols):
# change new_order to use keys instead, then break loop
for new_order_index, val in enumerate(new_order):
new_order[new_order_index] = row_margin.index.names[val]
break

row_margin.index = row_margin.index.reorder_levels(new_order)
else:
row_margin = Series(np.nan, index=result.columns)
Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/reshape/test_pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -2184,6 +2184,22 @@ def test_pivot_table_with_margins_and_numeric_columns(self):

tm.assert_frame_equal(result, expected)

def test_pivot_table_with_margins_and_columns_equals_one(self):
# GH 26568
df = DataFrame([["a", "x", 1], ["a", "y", 2], ["b", "y", 3], ["b", "z", 4]])

result = df.pivot_table(
index=0, columns=1, values=2, aggfunc="sum", fill_value=0, margins=True
)

expected = DataFrame([[1, 2, 0, 3], [0, 3, 4, 7], [1, 5, 4, 10]])
expected.columns = ["x", "y", "z", "All"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These attributes can be included in the DataFrame constructor e.g. DataFrame(..., columns=Index(..., name=1))

expected.index = ["a", "b", "All"]
expected.columns.name = 1
expected.index.name = 0

tm.assert_frame_equal(result, expected)


class TestPivot:
def test_pivot(self):
Expand Down