@@ -61,7 +61,9 @@ use ``DataFrame`` methods like :py:meth:`~pandas.DataFrame.reset_index`,
61
61
:py:meth: `~pandas.DataFrame.stack ` and :py:meth: `~pandas.DataFrame.unstack `.
62
62
63
63
To create a ``Dataset `` from a ``DataFrame ``, use the
64
- :py:meth: `~xarray.Dataset.from_dataframe ` class method:
64
+ :py:meth: `~xarray.Dataset.from_dataframe ` class method or the equivalent
65
+ :py:meth: `pandas.DataFrame.to_xarray <DataFrame.to_xarray> ` method (pandas
66
+ v0.18 or later):
65
67
66
68
.. ipython :: python
67
69
@@ -89,6 +91,7 @@ DataFrames:
89
91
s = ds[' foo' ].to_series()
90
92
s
91
93
94
+ # or equivalently, with Series.to_xarray()
92
95
xr.DataArray.from_series(s)
93
96
94
97
Both the ``from_series `` and ``from_dataframe `` methods use reindexing, so they
@@ -97,11 +100,14 @@ work even if not the hierarchical index is not a full tensor product:
97
100
.. ipython :: python
98
101
99
102
s[::2 ]
100
- xr.DataArray.from_series( s[::2 ])
103
+ s[::2 ].to_xarray( )
101
104
102
105
Multi-dimensional data
103
106
~~~~~~~~~~~~~~~~~~~~~~
104
107
108
+ Tidy data is great, but it sometimes you want to preserve dimensions instead of
109
+ automatically stacking them into a ``MultiIndex ``.
110
+
105
111
:py:meth: `DataArray.to_pandas() <xarray.DataArray.to_pandas> ` is a shortcut that
106
112
lets you convert a DataArray directly into a pandas object with the same
107
113
dimensionality (i.e., a 1D array is converted to a :py:class: `~pandas.Series `,
@@ -115,89 +121,101 @@ dimensionality (i.e., a 1D array is converted to a :py:class:`~pandas.Series`,
115
121
df
116
122
117
123
To perform the inverse operation of converting any pandas objects into a data
118
- array with the same shape, simply use the ``DataArray `` constructor:
124
+ array with the same shape, simply use the :py:class: `~xarray.DataArray `
125
+ constructor:
119
126
120
127
.. ipython :: python
121
128
122
129
xr.DataArray(df)
123
130
124
- xarray objects do not yet support hierarchical indexes, so if your data has
125
- a hierarchical index, you will either need to unstack it first or use the
126
- :py:meth: `~xarray.DataArray.from_series ` or
127
- :py:meth: `~xarray.Dataset.from_dataframe ` constructors described above.
131
+ Both the ``DataArray `` and ``Dataset `` constructors directly convert pandas
132
+ objects into xarray objects with the same shape. This means that they
133
+ preserve all use of multi-indexes:
134
+
135
+ .. ipython :: python
136
+
137
+ index = pd.MultiIndex.from_arrays([[' a' , ' a' , ' b' ], [0 , 1 , 2 ]],
138
+ names = [' one' , ' two' ])
139
+ df = pd.DataFrame({' x' : 1 , ' y' : 2 }, index = index)
140
+ ds = xr.Dataset(df)
141
+ ds
142
+
143
+ However, you will need to set dimension names explicitly, either with the
144
+ ``dims `` argument on in the ``DataArray `` constructor or by calling
145
+ :py:class: `~xarray.Dataset.rename ` on the new object.
128
146
147
+ .. _panel transition :
129
148
130
149
Transitioning from pandas.Panel to xarray
131
150
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
132
151
133
- :py:class: `~pandas.Panel `, pandas's data structure for 3D arrays, has always been a second class
134
- data structure compared to the Series and DataFrame. To allow pandas developers to focus more on
135
- its core functionality built around the DataFrame, pandas plans to eventually deprecate Panel.
152
+ :py:class: `~pandas.Panel `, pandas's data structure for 3D arrays, has always
153
+ been a second class data structure compared to the Series and DataFrame. To
154
+ allow pandas developers to focus more on its core functionality built around
155
+ the DataFrame, pandas plans to eventually deprecate Panel.
136
156
137
157
xarray has most of ``Panel ``'s features, a more explicit API (particularly around
138
158
indexing), and the ability to scale to >3 dimensions with the same interface.
139
159
140
- As discussed in the xarray docs, there are two primary data structures in xarray:
141
- ``DataArray `` and ``Dataset ``. You can imagine a ``DataArray `` as a n-dimensional pandas
142
- ``Series `` (i.e. a single typed array), and a ``Dataset `` as the `` DataFrame ``-equivalent
143
- (i.e. a dict of aligned ``DataArray``s ).
160
+ As discussed :ref: ` elsewhere < data structures >` in the docs, there are two primary data structures in
161
+ xarray: ``DataArray `` and ``Dataset ``. You can imagine a ``DataArray `` as a
162
+ n-dimensional pandas ``Series `` (i.e. a single typed array), and a ``Dataset ``
163
+ as the `` DataFrame `` equivalent (i.e. a dict of aligned ``DataArray `` objects ).
144
164
145
165
So you can represent a Panel, in two ways:
146
- - A 3-dimenional ``DataArray ``
147
- - A ``Dataset `` containing a number of 2-dimensional DataArray-s
166
+
167
+ - As a 3-dimensional ``DataArray ``,
168
+ - Or as a ``Dataset `` containing a number of 2-dimensional DataArray objects.
169
+
170
+ Let's take a look:
148
171
149
172
.. ipython :: python
173
+
150
174
panel = pd.Panel(np.random.rand(2 , 3 , 4 ), items = list (' ab' ), major_axis = list (' mno' ),
151
175
minor_axis = pd.date_range(start = ' 2000' , periods = 4 , name = ' date' ))
152
176
153
177
panel
154
178
155
-
156
179
As a DataArray:
157
180
158
-
159
181
.. ipython :: python
160
182
183
+ # or equivalently, with Panel.to_xarray()
161
184
xr.DataArray(panel)
162
185
163
- Or:
164
-
165
-
166
- .. ipython :: python
167
-
168
- panel.to_xarray()
169
-
170
-
171
- As you can see, there are three dimensions (each is also a coordinate). Two of the
172
- axes of the panel were unnamed, so have been assigned `dim_0 ` & `dim_1 ` respectively,
173
- while the third retains its name `date `.
174
-
186
+ As you can see, there are three dimensions (each is also a coordinate). Two of
187
+ the axes of the panel were unnamed, so have been assigned ``dim_0 `` and
188
+ ``dim_1 `` respectively, while the third retains its name ``date ``.
175
189
176
190
As a Dataset:
177
191
178
192
.. ipython :: python
193
+
179
194
xr.Dataset(panel)
180
195
181
- Here, there are two data variables, each representing a DataFrame on panel's ` items `
182
- axis, and labelled as such. Each variable is a 2D array of the respective values along
183
- the `items ` dimension.
196
+ Here, there are two data variables, each representing a DataFrame on panel's
197
+ `` items `` axis, and labelled as such. Each variable is a 2D array of the
198
+ respective values along the `` items ` ` dimension.
184
199
185
200
While the xarray docs are relatively complete, a few items stand out for Panel users:
201
+
186
202
- A DataArray's data is stored as a numpy array, and so can only contain a single
187
- type. As a result, a Panel that contains :py:class: `~pandas.DataFrame`s with
188
- multiple types will be converted to `object ` types. A ``Dataset `` of multiple ``DataArray``s
189
- each with its own dtype will allow original types to be preserved
190
- - Indexing is similar to pandas, but more explicit and leverages xarray's naming
191
- of dimensions
192
- - Because of those features, making much higher dimension-ed data is very practical
193
- - Variables in ``Dataset``s can use a subset of its dimensions. For example, you can
194
- have one dataset with Person x Score x Time, and another with Person x Score
203
+ type. As a result, a Panel that contains :py:class: `~pandas.DataFrame ` objects
204
+ with multiple types will be converted to ``dtype=object ``. A ``Dataset `` of
205
+ multiple ``DataArray `` objects each with its own dtype will allow original
206
+ types to be preserved.
207
+ - :ref: `Indexing <indexing >` is similar to pandas, but more explicit and
208
+ leverages xarray's naming of dimensions.
209
+ - Because of those features, making much higher dimensional data is very
210
+ practical.
211
+ - Variables in ``Dataset `` objects can use a subset of its dimensions. For
212
+ example, you can have one dataset with Person x Score x Time, and another with
213
+ Person x Score.
195
214
- You can use coordinates are used for both dimensions and for variables which
196
- _label_ the data variables, so you could have a coordinate Age, that labelled the
197
- `Person` dimension of a DataSet of Person x Score x Time
198
-
215
+ _label_ the data variables, so you could have a coordinate Age, that labelled
216
+ the Person dimension of a Dataset of Person x Score x Time.
199
217
200
218
While xarray may take some getting used to, it's worth it! If anything is unclear,
201
219
please post an issue on `GitHub <https://github.com/pydata/xarray >`__ or
202
220
`StackOverflow <http://stackoverflow.com/questions/tagged/python-xarray >`__,
203
- and we'll endeavor to respond to the specific case or improve the general docs.
221
+ and we'll endeavor to respond to the specific case or improve the general docs.
0 commit comments