Skip to content

DataFrame MultiIndex column access (and pop) #4145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hayd opened this issue Jul 6, 2013 · 7 comments · Fixed by #4148
Closed

DataFrame MultiIndex column access (and pop) #4145

hayd opened this issue Jul 6, 2013 · 7 comments · Fixed by #4148
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@hayd
Copy link
Contributor

hayd commented Jul 6, 2013

Suppose I want to acces a column in df2 (perhaps there is a near way, but I also expect these to work):

In [11]: df
Out[11]:
  h1 main  h3 sub  h5
0  a    A   1  A1   1
1  b    B   2  B1   2
2  c    B   3  A1   3
3  d    A   4  B2   4
4  e    A   5  B2   5
5  f    B   6  A2   6

In [12]: df2 = df.set_index(['main', 'sub']).T.sort_index(1)

In [13]: df2
Out[13]:
main  A        B
sub  A1 B2 B2 A1 A2 B1
h1    a  d  e  c  f  b
h3    1  4  5  3  6  2
h5    1  4  5  3  6  2

I want to access the column ('A', 'A1'):

In [14]: df2.iloc[:, 0]  # cheating with iloc
In [15]: df2.T.loc[('A', 'A1'), :].iloc[0]  # hacky!
In [16]: df2.iloc[:, df2.columns.get_loc(('A', 'A1')).start]  # very hacky!

In [17]: df2[df2.columns[:1]]  # returns DataFrame

I had assumed/hoped this would work:

In [18]: df2[('A', 'A1')]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-c307c7bb3bb8> in <module>()
----> 1 df2[('A', 'A1')]

/Users/234BroadWalk/pandas/pandas/core/frame.pyc in __getitem__(self, key)
   1997             return self._getitem_frame(key)
   1998         elif isinstance(self.columns, MultiIndex):
-> 1999             return self._getitem_multilevel(key)
   2000         else:
   2001             # get column

/Users/234BroadWalk/pandas/pandas/core/frame.pyc in _getitem_multilevel(self, key)
   2036         if isinstance(loc, (slice, np.ndarray)):
   2037             new_columns = self.columns[loc]
-> 2038             result_columns = _maybe_droplevels(new_columns, key)
   2039             if self._is_mixed_type:
   2040                 result = self.reindex(columns=new_columns)

/Users/234BroadWalk/pandas/pandas/core/indexing.pyc in _maybe_droplevels(index, key)
   1103     if isinstance(key, tuple):
   1104         for _ in key:
-> 1105             index = index.droplevel(0)
   1106     else:
   1107         index = index.droplevel(0)

AttributeError: 'Index' object has no attribute 'droplevel'

In [19]: df2[['A', 'A1']]  # interestingly, slightly different error here
KeyError: "['A1'] not in index"

Also this way is buggy (loses the index)... which is weird, separated this part of the issue as #4146:

In [21]: df2['A']['A1']  # in master but not in 0.11.0
Out[21]:
   0
0  a
1  1
2  1

pop uses this in it's implementation, so atm it's not possible to pop a MultiIndex.

@jtratner
Copy link
Contributor

jtratner commented Jul 6, 2013

I don't think df2[['A', 'A1']] should work with this, right? Because that's selecting for two individual columns 'A' and 'A1'. The tuple version ought to. Have you tried bisecting through earlier commits to see where this behavior started occurring?

@hayd
Copy link
Contributor Author

hayd commented Jul 6, 2013

@jreback No that shouldn't work (just was surprised it was doing something different to the tuple - as I thought it meant the tuple was being captured somehow, presumably as part of a MultiIndex...).

I haven't tried bisecting (ever) will have a go now.

@jtratner
Copy link
Contributor

jtratner commented Jul 6, 2013

@hayd Tuple is treated differently than list - tuples are considered a single element for the purposes of indexing, whereas a list is the (only?) way to have your input treated as a list of elements/lookups. So if you want to index into a MultiIndex, you ultimately have to use a tuple to get to it.

@jreback
Copy link
Contributor

jreback commented Jul 6, 2013

this is all pretty simple, PR coming shortly...basically an oversight

@jtratner
Copy link
Contributor

jtratner commented Jul 6, 2013

@jreback wow...that's pretty amazing that you can tell what's wrong so quickly.

@jreback
Copy link
Contributor

jreback commented Jul 6, 2013

nah...it was obvious once the test case hit it that the code was returning an incomplete answer (just the resulting values and not a full block manager); I had written the code so knew where it was (but I guess never explictiy made a test to validate that case)....its kind of deep...have to have a non-unique column index, and selecting a unique value from it (the original case was to handle selecting the non-unique values)

@jreback
Copy link
Contributor

jreback commented Jul 6, 2013

see pr #4148, df2[['A','A1']] raising is correct, and now df2[[('A','A1')]] works as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
3 participants