Skip to content

API: select levels of a MultiIndex #10816

@jorisvandenbossche

Description

@jorisvandenbossche

Say you have a multi-index:

In [34]: idx = pd.MultiIndex.from_product([['a', 'b', 'c'], [1, 2, 3], ['f', 'g'
]], names=['lev0', 'lev1', 'lev2'])

In [35]: df = pd.DataFrame(range(len(idx)), index=idx)

In [36]: df
Out[36]:
                 0
lev0 lev1 lev2
a    1    f      0
          g      1
     2    f      2
          g      3
     3    f      4
          g      5
b    1    f      6
          g      7
     2    f      8
          g      9
     3    f     10
          g     11
c    1    f     12
          g     13
     2    f     14
          g     15
     3    f     16
          g     17

and you want to select certain levels of the Index (like you select columns of a frame, I want to select levels of an index and get a subset of the index).

At the moment, some possibilities:

In [37]: pd.MultiIndex.from_arrays([df.index.get_level_values(0), df.index.get_level_values(1)])
Out[37]:
MultiIndex(levels=[[u'a', u'b', u'c'], [1, 2, 3]],
           labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2], [0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2]],
           names=[u'lev0', u'lev1'])

In [38]: idx.droplevel(-1)    # if you know the ones to drop
Out[38]:
MultiIndex(levels=[[u'a', u'b', u'c'], [1, 2, 3]],
           labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2], [0, 0
, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2]],
           names=[u'lev0', u'lev1'])

In [39]: df.reset_index().set_index(['lev0','lev1']).index
Out[39]:
MultiIndex(levels=[[u'a', u'b', u'c'], [1, 2, 3]],
           labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2], [0, 0
, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2]],
           names=[u'lev0', u'lev1'])

In [40]: df.reset_index(-1).index       # if you know the ones to drop
Out[40]:
MultiIndex(levels=[[u'a', u'b', u'c'], [1, 2, 3]],
           labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2], [0, 0
, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2]],
           names=[u'lev0', u'lev1'])

Am I missing an easy way to do this?

And if not, I think we should have a better way to do this.

Note: triggerd by this SO question: http://stackoverflow.com/questions/31991388/combinations-of-multiindex-levels-which-occur-in-a-dataframe (but had already encountered this multiple times)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions