Skip to content

Label slices on Series with MultiIndex #8539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
LeartS opened this issue Oct 11, 2014 · 5 comments
Closed

Label slices on Series with MultiIndex #8539

LeartS opened this issue Oct 11, 2014 · 5 comments

Comments

@LeartS
Copy link

LeartS commented Oct 11, 2014

I have a Series with a MultiIndex like this:

In [174]: a
Out[174]: 
hour  year  weekday  workingday       
0     2011  0        0           count      4.000000
                                 mean      76.750000
                                 std       53.368998
                                 min       17.000000
                                 25%       44.000000
                                 50%       75.000000
                                 75%      107.750000
                                 max      140.000000
                     1           count     28.000000
                                 mean      29.500000
                                 std       31.792732
                                 min        4.000000
                                 25%       15.000000
                                 50%       24.500000
                                 75%       35.500000
...
23    2012  5        0           mean     152.687500
                                 std       55.711490
                                 min       20.000000
                                 25%      119.000000
                                 50%      148.000000
                                 75%      187.750000
                                 max      239.000000
            6        0           count     34.000000
                                 mean      71.617647
                                 std       32.471913
                                 min       23.000000
                                 25%       46.000000
                                 50%       68.500000
                                 75%      104.250000
                                 max      123.000000
Length: 3456, dtype: float64

And I'd like to get, for example, the rows where hour=1, year=2011, weekday=[1,2,3,4]

Intuitively, I would do this: a.loc[1,2011,1:4] which would work in a 4-dimensional numpy array. Unfortunately it raises the error:

/usr/lib/python2.7/dist-packages/pandas/core/index.pyc in get_loc_level(self, key, level, drop_level)
   3214                             continue
   3215                         else:
-> 3216                             raise TypeError(key)
   3217 
   3218                     if indexer is None:

TypeError: (0, 2011, slice(0, 4, None))

Even If I try suppling the index list instead of a slice: a.loc[0,2011,[1,2,3,4]] I get an error: unhashable type: 'list'
Is there a reason why this syntax is not supported? How can I select labels slices in a MultiIndex?

@jreback
Copy link
Contributor

jreback commented Oct 11, 2014

docs are here: http://pandas.pydata.org/pandas-docs/dev/indexing.html#multiindexing-using-slicers

this is a 2-d structure so what you are writing is ambiguous at best

what if you had a MultiIndex in 2 dimensions?

using MultiIndex slicers this is easily accomplishes though

requires pandas >= 0.14

@LeartS
Copy link
Author

LeartS commented Oct 11, 2014

Ah! Thanks. Unfortunately I have 0.13, at least for a while.

Given that as long as I use single labels it works (e.g. a.loc[0,2011,1,0] works) it seemed really intuitive that a.loc[0,2001,0:3,0], for example, would work too. I don't see any ambiguity on Series with column-only (or row-only) MultiIndex.

@jreback
Copy link
Contributor

jreback commented Oct 11, 2014

you can use a tuple to specify the various axes snd a list of tuples
doing this in 0.13 is a bit non trivial with slices

their is no ambiguity in a series but the syntax works for multi dimensions and is much more clear

the syntax you are using really shouldn't be allowed IMHO but remains for back compat

best bet is to upgrade

@jreback jreback closed this as completed Oct 11, 2014
@jorisvandenbossche
Copy link
Member

@jreback this is a Series, not a DataFrame (so 1d structure), so I think what the OP did should work given pandas >= 0.14 (and it does also work) without an index slicer. Why shouldn't this be allowed?

@jreback
Copy link
Contributor

jreback commented Oct 13, 2014

This is fixed in 0.15.0 by #8132

Yes this is valid syntax for a Series (only)

In [1]: index = pd.MultiIndex.from_product([['one','two'],
   ...:                                     range(3),
   ...:                                     range(3),
   ...:                                     range(3)],names=['first','seconds','third','fourth'],
   ...:                                    )

In [2]: df = pd.DataFrame(np.arange(len(index)).reshape(-1,1),columns=['value'],index=index)

In [3]: 

In [3]: 

In [3]: df.loc['one',1,1,1]
Out[3]: 
value    13
Name: (one, 1, 1, 1), dtype: int64

In [4]: 

In [4]: df.loc['one',1,1,1:3]
Out[4]: 
                            value
first seconds third fourth       
one   1       1     1          13
                    2          14

In [5]: 

In [5]: df.loc['one',1,1:3]
Out[5]: 
                            value
first seconds third fourth       
one   1       1     0          12
                    1          13
                    2          14
              2     0          15
                    1          16
                    2          17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants