Skip to content

BUG: fixed buggy behavior in pivot_table with margins=True and numeric columns #46778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

andjhall
Copy link
Contributor

Found a error in the way reorder_levels was being called in _generate_marginal_values in pivot.py. Added a change that would modify the representation of the new order to use keys instead of position in cases where valid key was also a valid position. Included a test from original bug report that exposed the buggy behavior.

@pep8speaks
Copy link

pep8speaks commented Apr 14, 2022

Hello @andjhall! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-04-14 19:46:41 UTC

# GH26568 reorder_levels improperly uses a key that is also a number
# If a number in new_order is also a key,
# convert new_order to their respective keys in row_margin.index.names
for key in row_margin.index.names:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this just be a take(new_order) in the index names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke I tried using take() to do what we're doing here but it doesn't seem like its doing the same thing. Could you elaborate on what take(new_order) should be doing here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am interpreting your for loop correctly, you're converting new_order which originally represented positions to labels? This should be equivalent to a take operation if so: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.take.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm converting positions in the MultiIndex.names attribute into its corresponding labels. However, using take on a MultiIndex (rather than Index) doesn't replicate the same functionality:

(Pdb) p row_margin.index
MultiIndex([('x', 2),
            ('y', 2),
            ('z', 2)],
           names=[1, None])
(Pdb) p row_margin.index.take([1,0])
MultiIndex([('y', 2),
            ('x', 2)],
           names=[1, None])
(Pdb) p row_margin.index.take([1,0], axis=1)
MultiIndex([('y', 2),
            ('x', 2)],
           names=[1, None])

What I want is something like this:

MultiIndex([(2, 'x'),
            (2, 'y'),
            (2, 'z')],
           names=[None, 1])

Are we thinking about it the same way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your for loop though, it appears you want to take on the names not the actual values?

@@ -607,6 +607,7 @@ Reshaping
- Bug in :func:`get_dummies` that selected object and categorical dtypes but not string (:issue:`44965`)
- Bug in :meth:`DataFrame.align` when aligning a :class:`MultiIndex` to a :class:`Series` with another :class:`MultiIndex` (:issue:`46001`)
- Bug in concanenation with ``IntegerDtype``, or ``FloatingDtype`` arrays where the resulting dtype did not mirror the behavior of the non-nullable dtypes (:issue:`46379`)
- Bug in :func:`_generate_marginal_results` when passing in integer key values to :func:`reorder_levels` referenced level by both number and key instead of only one (:issue:`26568`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should describe the user facing impact of the change not the internal change i.e. the bug fix of pivot_table

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 17, 2022
)

expected = DataFrame([[1, 2, 0, 3], [0, 3, 4, 7], [1, 5, 4, 10]])
expected.columns = ["x", "y", "z", "All"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These attributes can be included in the DataFrame constructor e.g. DataFrame(..., columns=Index(..., name=1))

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label May 19, 2022
@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in contributing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pivot_table with margins=True and numeric columns is buggy
5 participants