-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DataFrame.reset_index deletes index, does not all for ints as level arg #16263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is related to passing the In [3]: pd.DataFrame(np.random.randn(4,4), columns=['A', 'B', 'C', 'D']).reset_index(level='not present')
Out[3]:
index A B C D
0 0 0.057457 0.065932 0.276079 0.305390
1 1 -0.562195 -0.385750 -0.228925 -0.426511
2 2 0.377559 -0.837031 -0.384840 -0.305262
3 3 -0.670057 -0.737446 0.561989 0.528754
In [4]: pd.DataFrame(np.random.randn(4,4), columns=['A', 'B', 'C', 'D']).reset_index(level=['not present'])
Out[4]:
index A B C D
0 0 0.613373 -0.169316 -0.592379 1.050764
1 1 0.069762 0.995308 0.030434 -0.361300
2 2 -0.526487 0.165054 0.015452 0.954447
3 3 0.585677 -1.435712 -0.298280 -0.581473 but a7a0574 changed the behaviour so that now, vice-versa, even valid level names/indices are not considered. I can provide a PR, the question is what we want to do with non-existent level names: raise or ignore? |
Sorry, the question is already answered by the behaviour when there is a In [5]: pd.DataFrame(np.random.randn(4,4), columns=['A', 'B', 'C', 'D']).set_index(['A', 'B']).reset_index(level=['A', 'E'])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/indexes/multi.py in _get_level_number(self, level)
610 'level number' % level)
--> 611 level = self.names.index(level)
612 except ValueError:
ValueError: 'E' is not in list
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-5-f1686d7d4dfc> in <module>()
----> 1 pd.DataFrame(np.random.randn(4,4), columns=['A', 'B', 'C', 'D']).set_index(['A', 'B']).reset_index(level=['A', 'E'])
/home/nobackup/repo/pandas/pandas/core/frame.py in reset_index(self, level, drop, inplace, col_level, col_fill)
3016 if not isinstance(level, (tuple, list)):
3017 level = [level]
-> 3018 level = [self.index._get_level_number(lev) for lev in level]
3019 if isinstance(self.index, MultiIndex):
3020 if len(level) < self.index.nlevels:
/home/nobackup/repo/pandas/pandas/core/frame.py in <listcomp>(.0)
3016 if not isinstance(level, (tuple, list)):
3017 level = [level]
-> 3018 level = [self.index._get_level_number(lev) for lev in level]
3019 if isinstance(self.index, MultiIndex):
3020 if len(level) < self.index.nlevels:
/home/nobackup/repo/pandas/pandas/core/indexes/multi.py in _get_level_number(self, level)
612 except ValueError:
613 if not isinstance(level, int):
--> 614 raise KeyError('Level %s not found' % str(level))
615 elif level < 0:
616 level += self.nlevels
KeyError: 'Level E not found' |
Perhaps it was working by accident before, but the new behavior of completely dropping the index column when reset index is called seems problematic. Additionally, according to the docs for reset_index "For a standard index, the index name will be used..." which indicates now it's even out of sync with the documented spec. It also brings up the important question, if we do want to keep this behavior going forward, then what is the new "correct" way to remove a single column index from a dataframe while keeping its data? |
@m4g005 To get this working the same in both version, you can try it without the level name. `data.reset_index(inplace=True) data`
|
This is going to be fixed, no doubt. |
Code Sample, a copy-pastable example if possible
u'0.20.1'
Problem description
between v0.19.2 and v0.20.1, the behavior of DataFrame.reset_index changed.
With a single set index:
Expected Output
v0.19.2 results:
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: