@@ -44,11 +44,26 @@ The categorical data type is useful in the following cases:
4444* As a signal to other Python libraries that this column should be treated as a categorical
4545 variable (e.g. to use suitable statistical methods or plot types).
4646
47+ .. note ::
48+
49+ In contrast to R's `factor ` function, categorical data is not converting input values to
50+ strings and categories will end up the same data type as the original values.
51+
52+ .. note ::
53+
54+ In contrast to R's `factor ` function, there is currently no way to assign/change labels at
55+ creation time. Use `categories ` to change the categories after creation time.
56+
4757See also the :ref: `API docs on categoricals<api.categorical> `.
4858
59+ .. _categorical.objectcreation :
60+
4961Object Creation
5062---------------
5163
64+ Series Creation
65+ ~~~~~~~~~~~~~~~
66+
5267Categorical ``Series `` or columns in a ``DataFrame `` can be created in several ways:
5368
5469By specifying ``dtype="category" `` when constructing a ``Series ``:
@@ -77,7 +92,7 @@ discrete bins. See the :ref:`example on tiling <reshaping.tile.cut>` in the docs
7792 df[' group' ] = pd.cut(df.value, range (0 , 105 , 10 ), right = False , labels = labels)
7893 df.head(10 )
7994
80- By passing a :class: `pandas.Categorical ` object to a `Series ` or assigning it to a `DataFrame `.
95+ By passing a :class: `pandas.Categorical ` object to a `` Series `` or assigning it to a `` DataFrame ` `.
8196
8297.. ipython :: python
8398
@@ -89,6 +104,56 @@ By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to
89104 df[" B" ] = raw_cat
90105 df
91106
107+ Categorical data has a specific ``category `` :ref: `dtype <basics.dtypes >`:
108+
109+ .. ipython :: python
110+
111+ df.dtypes
112+
113+ DataFrame Creation
114+ ~~~~~~~~~~~~~~~~~~
115+
116+ Columns in a ``DataFrame `` can be batch converted to categorical, either at the time of construction
117+ or after construction. The conversion to categorical is done on a column by column basis; labels present
118+ in a one column will not be carried over and used as categories in another column.
119+
120+ Columns can be batch converted by specifying ``dtype="category" `` when constructing a ``DataFrame ``:
121+
122+ .. ipython :: python
123+
124+ df = pd.DataFrame({' A' : list (' abca' ), ' B' : list (' bccd' )}, dtype = " category" )
125+ df.dtypes
126+
127+ Note that the categories present in each column differ; since the conversion is done on a column by column
128+ basis, only labels present in a given column are categories:
129+
130+ .. ipython :: python
131+
132+ df[' A' ]
133+ df[' B' ]
134+
135+
136+ .. versionadded :: 0.23.0
137+
138+ Similarly, columns in an existing ``DataFrame `` can be batch converted using :meth: `DataFrame.astype `:
139+
140+ .. ipython :: python
141+
142+ df = pd.DataFrame({' A' : list (' abca' ), ' B' : list (' bccd' )})
143+ df_cat = df.astype(' category' )
144+ df_cat.dtypes
145+
146+ This conversion is likewise done on a column by column basis:
147+
148+ .. ipython :: python
149+
150+ df_cat[' A' ]
151+ df_cat[' B' ]
152+
153+
154+ Controlling Behavior
155+ ~~~~~~~~~~~~~~~~~~~~
156+
92157In the examples above where we passed ``dtype='category' ``, we used the default
93158behavior:
94159
@@ -108,21 +173,30 @@ of :class:`~pandas.api.types.CategoricalDtype`.
108173 s_cat = s.astype(cat_type)
109174 s_cat
110175
111- Categorical data has a specific ``category `` :ref: `dtype <basics.dtypes >`:
176+ Similarly, a ``CategoricalDtype `` can be used with a ``DataFrame `` to ensure that categories
177+ are consistent among all columns.
112178
113179.. ipython :: python
114180
115- df.dtypes
181+ df = pd.DataFrame({' A' : list (' abca' ), ' B' : list (' bccd' )})
182+ cat_type = CategoricalDtype(categories = list (' abcd' ),
183+ ordered = True )
184+ df_cat = df.astype(cat_type)
185+ df_cat[' A' ]
186+ df_cat[' B' ]
116187
117- .. note ::
188+ If you already have `codes ` and `categories `, you can use the
189+ :func: `~pandas.Categorical.from_codes ` constructor to save the factorize step
190+ during normal constructor mode:
118191
119- In contrast to R's `factor ` function, categorical data is not converting input values to
120- strings and categories will end up the same data type as the original values.
192+ .. ipython :: python
121193
122- .. note ::
194+ splitter = np.random.choice([0 ,1 ], 5 , p = [0.5 ,0.5 ])
195+ s = pd.Series(pd.Categorical.from_codes(splitter, categories = [" train" , " test" ]))
123196
124- In contrast to R's `factor ` function, there is currently no way to assign/change labels at
125- creation time. Use `categories ` to change the categories after creation time.
197+
198+ Regaining Original Data
199+ ~~~~~~~~~~~~~~~~~~~~~~~
126200
127201To get back to the original ``Series `` or NumPy array, use
128202``Series.astype(original_dtype) `` or ``np.asarray(categorical) ``:
@@ -136,15 +210,6 @@ To get back to the original ``Series`` or NumPy array, use
136210 s2.astype(str )
137211 np.asarray(s2)
138212
139- If you already have `codes ` and `categories `, you can use the
140- :func: `~pandas.Categorical.from_codes ` constructor to save the factorize step
141- during normal constructor mode:
142-
143- .. ipython :: python
144-
145- splitter = np.random.choice([0 ,1 ], 5 , p = [0.5 ,0.5 ])
146- s = pd.Series(pd.Categorical.from_codes(splitter, categories = [" train" , " test" ]))
147-
148213 .. _categorical.categoricaldtype :
149214
150215CategoricalDtype
0 commit comments