Skip to content

Commit 4423e38

Browse files
committed
Fixes to the xarray docs on pandas
This includes follow up to fix formatting on GH832 and fixing where we said MultiIndex wasn't supported. CC MaximilianR
1 parent e889b88 commit 4423e38

File tree

3 files changed

+72
-51
lines changed

3 files changed

+72
-51
lines changed

doc/environment.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
name: xarray
1+
name: xarray-docs
22
dependencies:
33
- python=2.7
4-
- numpy=1.10
5-
- pandas=0.17.1
4+
- numpy=1.11
5+
- pandas=0.18.1
66
- numpydoc=0.5
7-
- seaborn=0.6
8-
- dask=0.7.5
7+
- seaborn=0.7.1
8+
- dask=0.10.1
99
- ipython=4.0.1
10-
- sphinx=1.2.3 # pin to avoid https://github.com/sphinx-doc/sphinx/issues/1822
10+
- sphinx=1.4.1

doc/pandas.rst

Lines changed: 63 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,9 @@ use ``DataFrame`` methods like :py:meth:`~pandas.DataFrame.reset_index`,
6161
:py:meth:`~pandas.DataFrame.stack` and :py:meth:`~pandas.DataFrame.unstack`.
6262

6363
To create a ``Dataset`` from a ``DataFrame``, use the
64-
:py:meth:`~xarray.Dataset.from_dataframe` class method:
64+
:py:meth:`~xarray.Dataset.from_dataframe` class method or the equivalent
65+
:py:meth:`pandas.DataFrame.to_xarray <DataFrame.to_xarray>` method (pandas
66+
v0.18 or later):
6567

6668
.. ipython:: python
6769
@@ -89,6 +91,7 @@ DataFrames:
8991
s = ds['foo'].to_series()
9092
s
9193
94+
# or equivalently, with Series.to_xarray()
9295
xr.DataArray.from_series(s)
9396
9497
Both the ``from_series`` and ``from_dataframe`` methods use reindexing, so they
@@ -97,11 +100,14 @@ work even if not the hierarchical index is not a full tensor product:
97100
.. ipython:: python
98101
99102
s[::2]
100-
xr.DataArray.from_series(s[::2])
103+
s[::2].to_xarray()
101104
102105
Multi-dimensional data
103106
~~~~~~~~~~~~~~~~~~~~~~
104107

108+
Tidy data is great, but it sometimes you want to preserve dimensions instead of
109+
automatically stacking them into a ``MultiIndex``.
110+
105111
:py:meth:`DataArray.to_pandas() <xarray.DataArray.to_pandas>` is a shortcut that
106112
lets you convert a DataArray directly into a pandas object with the same
107113
dimensionality (i.e., a 1D array is converted to a :py:class:`~pandas.Series`,
@@ -115,89 +121,101 @@ dimensionality (i.e., a 1D array is converted to a :py:class:`~pandas.Series`,
115121
df
116122
117123
To perform the inverse operation of converting any pandas objects into a data
118-
array with the same shape, simply use the ``DataArray`` constructor:
124+
array with the same shape, simply use the :py:class:`~xarray.DataArray`
125+
constructor:
119126

120127
.. ipython:: python
121128
122129
xr.DataArray(df)
123130
124-
xarray objects do not yet support hierarchical indexes, so if your data has
125-
a hierarchical index, you will either need to unstack it first or use the
126-
:py:meth:`~xarray.DataArray.from_series` or
127-
:py:meth:`~xarray.Dataset.from_dataframe` constructors described above.
131+
Both the ``DataArray`` and ``Dataset`` constructors directly convert pandas
132+
objects into xarray objects with the same shape. This means that they
133+
preserve all use of multi-indexes:
134+
135+
.. ipython:: python
136+
137+
index = pd.MultiIndex.from_arrays([['a', 'a', 'b'], [0, 1, 2]],
138+
names=['one', 'two'])
139+
df = pd.DataFrame({'x': 1, 'y': 2}, index=index)
140+
ds = xr.Dataset(df)
141+
ds
142+
143+
However, you will need to set dimension names explicitly, either with the
144+
``dims`` argument on in the ``DataArray`` constructor or by calling
145+
:py:class:`~xarray.Dataset.rename` on the new object.
128146

147+
.. _panel transition:
129148

130149
Transitioning from pandas.Panel to xarray
131150
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
132151

133-
:py:class:`~pandas.Panel`, pandas's data structure for 3D arrays, has always been a second class
134-
data structure compared to the Series and DataFrame. To allow pandas developers to focus more on
135-
its core functionality built around the DataFrame, pandas plans to eventually deprecate Panel.
152+
:py:class:`~pandas.Panel`, pandas's data structure for 3D arrays, has always
153+
been a second class data structure compared to the Series and DataFrame. To
154+
allow pandas developers to focus more on its core functionality built around
155+
the DataFrame, pandas plans to eventually deprecate Panel.
136156

137157
xarray has most of ``Panel``'s features, a more explicit API (particularly around
138158
indexing), and the ability to scale to >3 dimensions with the same interface.
139159

140-
As discussed in the xarray docs, there are two primary data structures in xarray:
141-
``DataArray`` and ``Dataset``. You can imagine a ``DataArray`` as a n-dimensional pandas
142-
``Series`` (i.e. a single typed array), and a ``Dataset`` as the ``DataFrame``-equivalent
143-
(i.e. a dict of aligned ``DataArray``s).
160+
As discussed :ref:`elsewhere <data structures>` in the docs, there are two primary data structures in
161+
xarray: ``DataArray`` and ``Dataset``. You can imagine a ``DataArray`` as a
162+
n-dimensional pandas ``Series`` (i.e. a single typed array), and a ``Dataset``
163+
as the ``DataFrame`` equivalent (i.e. a dict of aligned ``DataArray`` objects).
144164

145165
So you can represent a Panel, in two ways:
146-
- A 3-dimenional ``DataArray``
147-
- A ``Dataset`` containing a number of 2-dimensional DataArray-s
166+
167+
- As a 3-dimensional ``DataArray``,
168+
- Or as a ``Dataset`` containing a number of 2-dimensional DataArray objects.
169+
170+
Let's take a look:
148171

149172
.. ipython:: python
173+
150174
panel = pd.Panel(np.random.rand(2, 3, 4), items=list('ab'), major_axis=list('mno'),
151175
minor_axis=pd.date_range(start='2000', periods=4, name='date'))
152176
153177
panel
154178
155-
156179
As a DataArray:
157180

158-
159181
.. ipython:: python
160182
183+
# or equivalently, with Panel.to_xarray()
161184
xr.DataArray(panel)
162185
163-
Or:
164-
165-
166-
.. ipython:: python
167-
168-
panel.to_xarray()
169-
170-
171-
As you can see, there are three dimensions (each is also a coordinate). Two of the
172-
axes of the panel were unnamed, so have been assigned `dim_0` & `dim_1` respectively,
173-
while the third retains its name `date`.
174-
186+
As you can see, there are three dimensions (each is also a coordinate). Two of
187+
the axes of the panel were unnamed, so have been assigned ``dim_0`` and
188+
``dim_1`` respectively, while the third retains its name ``date``.
175189

176190
As a Dataset:
177191

178192
.. ipython:: python
193+
179194
xr.Dataset(panel)
180195
181-
Here, there are two data variables, each representing a DataFrame on panel's `items`
182-
axis, and labelled as such. Each variable is a 2D array of the respective values along
183-
the `items` dimension.
196+
Here, there are two data variables, each representing a DataFrame on panel's
197+
``items`` axis, and labelled as such. Each variable is a 2D array of the
198+
respective values along the ``items`` dimension.
184199

185200
While the xarray docs are relatively complete, a few items stand out for Panel users:
201+
186202
- A DataArray's data is stored as a numpy array, and so can only contain a single
187-
type. As a result, a Panel that contains :py:class:`~pandas.DataFrame`s with
188-
multiple types will be converted to `object` types. A ``Dataset`` of multiple ``DataArray``s
189-
each with its own dtype will allow original types to be preserved
190-
- Indexing is similar to pandas, but more explicit and leverages xarray's naming
191-
of dimensions
192-
- Because of those features, making much higher dimension-ed data is very practical
193-
- Variables in ``Dataset``s can use a subset of its dimensions. For example, you can
194-
have one dataset with Person x Score x Time, and another with Person x Score
203+
type. As a result, a Panel that contains :py:class:`~pandas.DataFrame` objects
204+
with multiple types will be converted to ``dtype=object``. A ``Dataset`` of
205+
multiple ``DataArray`` objects each with its own dtype will allow original
206+
types to be preserved.
207+
- :ref:`Indexing <indexing>` is similar to pandas, but more explicit and
208+
leverages xarray's naming of dimensions.
209+
- Because of those features, making much higher dimensional data is very
210+
practical.
211+
- Variables in ``Dataset`` objects can use a subset of its dimensions. For
212+
example, you can have one dataset with Person x Score x Time, and another with
213+
Person x Score.
195214
- You can use coordinates are used for both dimensions and for variables which
196-
_label_ the data variables, so you could have a coordinate Age, that labelled the
197-
`Person` dimension of a DataSet of Person x Score x Time
198-
215+
_label_ the data variables, so you could have a coordinate Age, that labelled
216+
the Person dimension of a Dataset of Person x Score x Time.
199217

200218
While xarray may take some getting used to, it's worth it! If anything is unclear,
201219
please post an issue on `GitHub <https://github.com/pydata/xarray>`__ or
202220
`StackOverflow <http://stackoverflow.com/questions/tagged/python-xarray>`__,
203-
and we'll endeavor to respond to the specific case or improve the general docs.
221+
and we'll endeavor to respond to the specific case or improve the general docs.

doc/whats-new.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@ v0.8.2 (unreleased)
2121
Enhancements
2222
~~~~~~~~~~~~
2323

24+
- New documentation on :ref:`panel transition`. By
25+
`Maximilian Roos <https://github.com/MaximilianR>`_.
26+
2427
Bug fixes
2528
~~~~~~~~~
2629

0 commit comments

Comments
 (0)