66 :suppress:
77
88 import numpy as np
9- import random
10- np.random.seed(123456 )
11- from pandas import *
12- options.display.max_rows= 15
139 import pandas as pd
14- randn = np.random.randn
15- randint = np.random.randint
10+ np.random.seed(123456 )
1611 np.set_printoptions(precision = 4 , suppress = True )
17- from pandas.compat import range , zip
12+ pd.options.display.max_rows = 15
1813
1914******************************
2015MultiIndex / Advanced Indexing
@@ -80,10 +75,10 @@ demo different ways to initialize MultiIndexes.
8075 tuples = list (zip (* arrays))
8176 tuples
8277
83- index = MultiIndex.from_tuples(tuples, names = [' first' , ' second' ])
78+ index = pd. MultiIndex.from_tuples(tuples, names = [' first' , ' second' ])
8479 index
8580
86- s = Series(randn(8 ), index = index)
81+ s = pd. Series(np.random. randn(8 ), index = index)
8782 s
8883
8984 When you want every pairing of the elements in two iterables, it can be easier
@@ -92,7 +87,7 @@ to use the ``MultiIndex.from_product`` function:
9287.. ipython :: python
9388
9489 iterables = [[' bar' , ' baz' , ' foo' , ' qux' ], [' one' , ' two' ]]
95- MultiIndex.from_product(iterables, names = [' first' , ' second' ])
90+ pd. MultiIndex.from_product(iterables, names = [' first' , ' second' ])
9691
9792 As a convenience, you can pass a list of arrays directly into Series or
9893DataFrame to construct a MultiIndex automatically:
@@ -101,9 +96,9 @@ DataFrame to construct a MultiIndex automatically:
10196
10297 arrays = [np.array([' bar' , ' bar' , ' baz' , ' baz' , ' foo' , ' foo' , ' qux' , ' qux' ]),
10398 np.array([' one' , ' two' , ' one' , ' two' , ' one' , ' two' , ' one' , ' two' ])]
104- s = Series(randn(8 ), index = arrays)
99+ s = pd. Series(np.random. randn(8 ), index = arrays)
105100 s
106- df = DataFrame(randn(8 , 4 ), index = arrays)
101+ df = pd. DataFrame(np.random. randn(8 , 4 ), index = arrays)
107102 df
108103
109104 All of the ``MultiIndex `` constructors accept a ``names `` argument which stores
@@ -119,9 +114,9 @@ of the index is up to you:
119114
120115.. ipython :: python
121116
122- df = DataFrame(randn(3 , 8 ), index = [' A' , ' B' , ' C' ], columns = index)
117+ df = pd. DataFrame(np.random. randn(3 , 8 ), index = [' A' , ' B' , ' C' ], columns = index)
123118 df
124- DataFrame(randn(6 , 6 ), index = index[:6 ], columns = index[:6 ])
119+ pd. DataFrame(np.random. randn(6 , 6 ), index = index[:6 ], columns = index[:6 ])
125120
126121 We've "sparsified" the higher levels of the indexes to make the console output a
127122bit easier on the eyes.
@@ -131,7 +126,7 @@ tuples as atomic labels on an axis:
131126
132127.. ipython :: python
133128
134- Series(randn(8 ), index = tuples)
129+ pd. Series(np.random. randn(8 ), index = tuples)
135130
136131 The reason that the ``MultiIndex `` matters is that it can allow you to do
137132grouping, selection, and reshaping operations as we will describe below and in
@@ -282,16 +277,16 @@ As usual, **both sides** of the slicers are included as this is label indexing.
282277 def mklbl (prefix ,n ):
283278 return [" %s%s " % (prefix,i) for i in range (n)]
284279
285- miindex = MultiIndex.from_product([mklbl(' A' ,4 ),
286- mklbl(' B' ,2 ),
287- mklbl(' C' ,4 ),
288- mklbl(' D' ,2 )])
289- micolumns = MultiIndex.from_tuples([(' a' ,' foo' ),(' a' ,' bar' ),
290- (' b' ,' foo' ),(' b' ,' bah' )],
291- names = [' lvl0' , ' lvl1' ])
292- dfmi = DataFrame(np.arange(len (miindex)* len (micolumns)).reshape((len (miindex),len (micolumns))),
293- index = miindex,
294- columns = micolumns).sortlevel().sortlevel(axis = 1 )
280+ miindex = pd. MultiIndex.from_product([mklbl(' A' ,4 ),
281+ mklbl(' B' ,2 ),
282+ mklbl(' C' ,4 ),
283+ mklbl(' D' ,2 )])
284+ micolumns = pd. MultiIndex.from_tuples([(' a' ,' foo' ),(' a' ,' bar' ),
285+ (' b' ,' foo' ),(' b' ,' bah' )],
286+ names = [' lvl0' , ' lvl1' ])
287+ dfmi = pd. DataFrame(np.arange(len (miindex)* len (micolumns)).reshape((len (miindex),len (micolumns))),
288+ index = miindex,
289+ columns = micolumns).sortlevel().sortlevel(axis = 1 )
295290 dfmi
296291
297292 Basic multi-index slicing using slices, lists, and labels.
@@ -418,9 +413,9 @@ instance:
418413
419414.. ipython :: python
420415
421- midx = MultiIndex(levels = [[' zero' , ' one' ], [' x' ,' y' ]],
422- labels = [[1 ,1 ,0 ,0 ],[1 ,0 ,1 ,0 ]])
423- df = DataFrame(randn(4 ,2 ), index = midx)
416+ midx = pd. MultiIndex(levels = [[' zero' , ' one' ], [' x' ,' y' ]],
417+ labels = [[1 ,1 ,0 ,0 ],[1 ,0 ,1 ,0 ]])
418+ df = pd. DataFrame(np.random. randn(4 ,2 ), index = midx)
424419 df
425420 df2 = df.mean(level = 0 )
426421 df2
@@ -471,7 +466,7 @@ labels will be sorted lexicographically!
471466.. ipython :: python
472467
473468 import random; random.shuffle(tuples)
474- s = Series(randn(8 ), index = MultiIndex.from_tuples(tuples))
469+ s = pd. Series(np.random. randn(8 ), index = pd. MultiIndex.from_tuples(tuples))
475470 s
476471 s.sortlevel(0 )
477472 s.sortlevel(1 )
@@ -509,13 +504,13 @@ an exception. Here is a concrete example to illustrate this:
509504.. ipython :: python
510505
511506 tuples = [(' a' , ' a' ), (' a' , ' b' ), (' b' , ' a' ), (' b' , ' b' )]
512- idx = MultiIndex.from_tuples(tuples)
507+ idx = pd. MultiIndex.from_tuples(tuples)
513508 idx.lexsort_depth
514509
515510 reordered = idx[[1 , 0 , 3 , 2 ]]
516511 reordered.lexsort_depth
517512
518- s = Series(randn(4 ), index = reordered)
513+ s = pd. Series(np.random. randn(4 ), index = reordered)
519514 s.ix[' a' :' a' ]
520515
521516 However:
@@ -540,15 +535,15 @@ index positions. ``take`` will also accept negative integers as relative positio
540535
541536.. ipython :: python
542537
543- index = Index(randint(0 , 1000 , 10 ))
538+ index = pd. Index(np.random. randint(0 , 1000 , 10 ))
544539 index
545540
546541 positions = [0 , 9 , 3 ]
547542
548543 index[positions]
549544 index.take(positions)
550545
551- ser = Series(randn(10 ))
546+ ser = pd. Series(np.random. randn(10 ))
552547
553548 ser.iloc[positions]
554549 ser.take(positions)
@@ -558,7 +553,7 @@ row or column positions.
558553
559554.. ipython :: python
560555
561- frm = DataFrame(randn(5 , 3 ))
556+ frm = pd. DataFrame(np.random. randn(5 , 3 ))
562557
563558 frm.take([1 , 4 , 3 ])
564559
@@ -569,11 +564,11 @@ intended to work on boolean indices and may return unexpected results.
569564
570565.. ipython :: python
571566
572- arr = randn(10 )
567+ arr = np.random. randn(10 )
573568 arr.take([False , False , True , True ])
574569 arr[[0 , 1 ]]
575570
576- ser = Series(randn(10 ))
571+ ser = pd. Series(np.random. randn(10 ))
577572 ser.take([False , False , True , True ])
578573 ser.ix[[0 , 1 ]]
579574
@@ -583,14 +578,14 @@ faster than fancy indexing.
583578
584579.. ipython ::
585580
586- arr = randn(10000, 5)
581+ arr = np.random. randn(10000, 5)
587582 indexer = np.arange(10000)
588583 random.shuffle(indexer)
589584
590585 timeit arr[indexer]
591586 timeit arr.take(indexer, axis=0)
592587
593- ser = Series(arr[:, 0])
588+ ser = pd. Series(arr[:, 0])
594589 timeit ser.ix[indexer]
595590 timeit ser.take(indexer)
596591
@@ -608,10 +603,9 @@ setting the index of a ``DataFrame/Series`` with a ``category`` dtype would conv
608603
609604.. ipython :: python
610605
611- df = DataFrame({' A' : np.arange(6 ),
612- ' B' : Series(list (' aabbca' )).astype(' category' ,
613- categories = list (' cab' ))
614- })
606+ df = pd.DataFrame({' A' : np.arange(6 ),
607+ ' B' : list (' aabbca' )})
608+ df[' B' ] = df[' B' ].astype(' category' , categories = list (' cab' ))
615609 df
616610 df.dtypes
617611 df.B.cat.categories
@@ -669,15 +663,15 @@ values NOT in the categories, similarly to how you can reindex ANY pandas index.
669663
670664 .. code-block :: python
671665
672- In [10 ]: df3 = DataFrame({' A' : np.arange(6 ),
673- ' B' : Series(list (' aabbca' )).astype(' category' ,
674- categories = list ( ' abc ' ))
675- }) .set_index(' B' )
666+ In [9 ]: df3 = pd. DataFrame({' A' : np.arange(6 ),
667+ ' B' : pd. Series(list (' aabbca' )).astype(' category' )})
668+
669+ In [ 11 ]: df3 = df3 .set_index(' B' )
676670
677671 In [11 ]: df3.index
678672 Out[11 ]: CategoricalIndex([u ' a' , u ' a' , u ' b' , u ' b' , u ' c' , u ' a' ], categories = [u ' a' , u ' b' , u ' c' ], ordered = False , name = u ' B' , dtype = ' category' )
679673
680- In [12 ]: pd.concat([df2,df3]
674+ In [12 ]: pd.concat([df2, df3]
681675 TypeError : categories must match existing categories when appending
682676
683677.. _indexing.float64index:
@@ -702,9 +696,9 @@ same.
702696
703697.. ipython:: python
704698
705- indexf = Index([1.5 , 2 , 3 , 4.5 , 5 ])
699+ indexf = pd. Index([1.5 , 2 , 3 , 4.5 , 5 ])
706700 indexf
707- sf = Series(range (5 ),index = indexf)
701+ sf = pd. Series(range (5 ), index = indexf)
708702 sf
709703
710704Scalar selection for `` [],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. `` 3 `` is equivalent to `` 3.0 `` )
@@ -746,17 +740,17 @@ In non-float indexes, slicing using floats will raise a ``TypeError``
746740
747741.. code- block:: python
748742
749- In [1 ]: Series(range (5 ))[3.5 ]
743+ In [1 ]: pd. Series(range (5 ))[3.5 ]
750744 TypeError : the label [3.5 ] is not a proper indexer for this index type (Int64Index)
751745
752- In [1 ]: Series(range (5 ))[3.5 :4.5 ]
746+ In [1 ]: pd. Series(range (5 ))[3.5 :4.5 ]
753747 TypeError : the slice start [3.5 ] is not a proper indexer for this index type (Int64Index)
754748
755749Using a scalar float indexer will be deprecated in a future version, but is allowed for now.
756750
757751.. code- block:: python
758752
759- In [3 ]: Series(range (5 ))[3.0 ]
753+ In [3 ]: pd. Series(range (5 ))[3.0 ]
760754 Out[3 ]: 3
761755
762756Here is a typical use- case for using this type of indexing. Imagine that you have a somewhat
@@ -765,12 +759,12 @@ example be millisecond offsets.
765759
766760.. ipython:: python
767761
768- dfir = concat([DataFrame(randn(5 ,2 ),
769- index = np.arange(5 ) * 250.0 ,
770- columns = list (' AB' )),
771- DataFrame(randn(6 ,2 ),
772- index = np.arange(4 ,10 ) * 250.1 ,
773- columns = list (' AB' ))])
762+ dfir = pd. concat([pd. DataFrame(np.random. randn(5 ,2 ),
763+ index = np.arange(5 ) * 250.0 ,
764+ columns = list (' AB' )),
765+ pd. DataFrame(np.random. randn(6 ,2 ),
766+ index = np.arange(4 ,10 ) * 250.1 ,
767+ columns = list (' AB' ))])
774768 dfir
775769
776770Selection operations then will always work on a value basis, for all selection operators.
0 commit comments