Skip to content

Commit bff1954

Browse files
authored
Merge pull request #955 from shoyer/panel-doc-fix
Fixes to the xarray docs on pandas
2 parents e889b88 + 4423e38 commit bff1954

File tree

3 files changed

+72
-51
lines changed

3 files changed

+72
-51
lines changed

doc/environment.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
name: xarray
1+
name: xarray-docs
22
dependencies:
33
- python=2.7
4-
- numpy=1.10
5-
- pandas=0.17.1
4+
- numpy=1.11
5+
- pandas=0.18.1
66
- numpydoc=0.5
7-
- seaborn=0.6
8-
- dask=0.7.5
7+
- seaborn=0.7.1
8+
- dask=0.10.1
99
- ipython=4.0.1
10-
- sphinx=1.2.3 # pin to avoid https://github.com/sphinx-doc/sphinx/issues/1822
10+
- sphinx=1.4.1

doc/pandas.rst

Lines changed: 63 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,9 @@ use ``DataFrame`` methods like :py:meth:`~pandas.DataFrame.reset_index`,
6161
:py:meth:`~pandas.DataFrame.stack` and :py:meth:`~pandas.DataFrame.unstack`.
6262

6363
To create a ``Dataset`` from a ``DataFrame``, use the
64-
:py:meth:`~xarray.Dataset.from_dataframe` class method:
64+
:py:meth:`~xarray.Dataset.from_dataframe` class method or the equivalent
65+
:py:meth:`pandas.DataFrame.to_xarray <DataFrame.to_xarray>` method (pandas
66+
v0.18 or later):
6567

6668
.. ipython:: python
6769
@@ -89,6 +91,7 @@ DataFrames:
8991
s = ds['foo'].to_series()
9092
s
9193
94+
# or equivalently, with Series.to_xarray()
9295
xr.DataArray.from_series(s)
9396
9497
Both the ``from_series`` and ``from_dataframe`` methods use reindexing, so they
@@ -97,11 +100,14 @@ work even if not the hierarchical index is not a full tensor product:
97100
.. ipython:: python
98101
99102
s[::2]
100-
xr.DataArray.from_series(s[::2])
103+
s[::2].to_xarray()
101104
102105
Multi-dimensional data
103106
~~~~~~~~~~~~~~~~~~~~~~
104107

108+
Tidy data is great, but it sometimes you want to preserve dimensions instead of
109+
automatically stacking them into a ``MultiIndex``.
110+
105111
:py:meth:`DataArray.to_pandas() <xarray.DataArray.to_pandas>` is a shortcut that
106112
lets you convert a DataArray directly into a pandas object with the same
107113
dimensionality (i.e., a 1D array is converted to a :py:class:`~pandas.Series`,
@@ -115,89 +121,101 @@ dimensionality (i.e., a 1D array is converted to a :py:class:`~pandas.Series`,
115121
df
116122
117123
To perform the inverse operation of converting any pandas objects into a data
118-
array with the same shape, simply use the ``DataArray`` constructor:
124+
array with the same shape, simply use the :py:class:`~xarray.DataArray`
125+
constructor:
119126

120127
.. ipython:: python
121128
122129
xr.DataArray(df)
123130
124-
xarray objects do not yet support hierarchical indexes, so if your data has
125-
a hierarchical index, you will either need to unstack it first or use the
126-
:py:meth:`~xarray.DataArray.from_series` or
127-
:py:meth:`~xarray.Dataset.from_dataframe` constructors described above.
131+
Both the ``DataArray`` and ``Dataset`` constructors directly convert pandas
132+
objects into xarray objects with the same shape. This means that they
133+
preserve all use of multi-indexes:
134+
135+
.. ipython:: python
136+
137+
index = pd.MultiIndex.from_arrays([['a', 'a', 'b'], [0, 1, 2]],
138+
names=['one', 'two'])
139+
df = pd.DataFrame({'x': 1, 'y': 2}, index=index)
140+
ds = xr.Dataset(df)
141+
ds
142+
143+
However, you will need to set dimension names explicitly, either with the
144+
``dims`` argument on in the ``DataArray`` constructor or by calling
145+
:py:class:`~xarray.Dataset.rename` on the new object.
128146

147+
.. _panel transition:
129148

130149
Transitioning from pandas.Panel to xarray
131150
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
132151

133-
:py:class:`~pandas.Panel`, pandas's data structure for 3D arrays, has always been a second class
134-
data structure compared to the Series and DataFrame. To allow pandas developers to focus more on
135-
its core functionality built around the DataFrame, pandas plans to eventually deprecate Panel.
152+
:py:class:`~pandas.Panel`, pandas's data structure for 3D arrays, has always
153+
been a second class data structure compared to the Series and DataFrame. To
154+
allow pandas developers to focus more on its core functionality built around
155+
the DataFrame, pandas plans to eventually deprecate Panel.
136156

137157
xarray has most of ``Panel``'s features, a more explicit API (particularly around
138158
indexing), and the ability to scale to >3 dimensions with the same interface.
139159

140-
As discussed in the xarray docs, there are two primary data structures in xarray:
141-
``DataArray`` and ``Dataset``. You can imagine a ``DataArray`` as a n-dimensional pandas
142-
``Series`` (i.e. a single typed array), and a ``Dataset`` as the ``DataFrame``-equivalent
143-
(i.e. a dict of aligned ``DataArray``s).
160+
As discussed :ref:`elsewhere <data structures>` in the docs, there are two primary data structures in
161+
xarray: ``DataArray`` and ``Dataset``. You can imagine a ``DataArray`` as a
162+
n-dimensional pandas ``Series`` (i.e. a single typed array), and a ``Dataset``
163+
as the ``DataFrame`` equivalent (i.e. a dict of aligned ``DataArray`` objects).
144164

145165
So you can represent a Panel, in two ways:
146-
- A 3-dimenional ``DataArray``
147-
- A ``Dataset`` containing a number of 2-dimensional DataArray-s
166+
167+
- As a 3-dimensional ``DataArray``,
168+
- Or as a ``Dataset`` containing a number of 2-dimensional DataArray objects.
169+
170+
Let's take a look:
148171

149172
.. ipython:: python
173+
150174
panel = pd.Panel(np.random.rand(2, 3, 4), items=list('ab'), major_axis=list('mno'),
151175
minor_axis=pd.date_range(start='2000', periods=4, name='date'))
152176
153177
panel
154178
155-
156179
As a DataArray:
157180

158-
159181
.. ipython:: python
160182
183+
# or equivalently, with Panel.to_xarray()
161184
xr.DataArray(panel)
162185
163-
Or:
164-
165-
166-
.. ipython:: python
167-
168-
panel.to_xarray()
169-
170-
171-
As you can see, there are three dimensions (each is also a coordinate). Two of the
172-
axes of the panel were unnamed, so have been assigned `dim_0` & `dim_1` respectively,
173-
while the third retains its name `date`.
174-
186+
As you can see, there are three dimensions (each is also a coordinate). Two of
187+
the axes of the panel were unnamed, so have been assigned ``dim_0`` and
188+
``dim_1`` respectively, while the third retains its name ``date``.
175189

176190
As a Dataset:
177191

178192
.. ipython:: python
193+
179194
xr.Dataset(panel)
180195
181-
Here, there are two data variables, each representing a DataFrame on panel's `items`
182-
axis, and labelled as such. Each variable is a 2D array of the respective values along
183-
the `items` dimension.
196+
Here, there are two data variables, each representing a DataFrame on panel's
197+
``items`` axis, and labelled as such. Each variable is a 2D array of the
198+
respective values along the ``items`` dimension.
184199

185200
While the xarray docs are relatively complete, a few items stand out for Panel users:
201+
186202
- A DataArray's data is stored as a numpy array, and so can only contain a single
187-
type. As a result, a Panel that contains :py:class:`~pandas.DataFrame`s with
188-
multiple types will be converted to `object` types. A ``Dataset`` of multiple ``DataArray``s
189-
each with its own dtype will allow original types to be preserved
190-
- Indexing is similar to pandas, but more explicit and leverages xarray's naming
191-
of dimensions
192-
- Because of those features, making much higher dimension-ed data is very practical
193-
- Variables in ``Dataset``s can use a subset of its dimensions. For example, you can
194-
have one dataset with Person x Score x Time, and another with Person x Score
203+
type. As a result, a Panel that contains :py:class:`~pandas.DataFrame` objects
204+
with multiple types will be converted to ``dtype=object``. A ``Dataset`` of
205+
multiple ``DataArray`` objects each with its own dtype will allow original
206+
types to be preserved.
207+
- :ref:`Indexing <indexing>` is similar to pandas, but more explicit and
208+
leverages xarray's naming of dimensions.
209+
- Because of those features, making much higher dimensional data is very
210+
practical.
211+
- Variables in ``Dataset`` objects can use a subset of its dimensions. For
212+
example, you can have one dataset with Person x Score x Time, and another with
213+
Person x Score.
195214
- You can use coordinates are used for both dimensions and for variables which
196-
_label_ the data variables, so you could have a coordinate Age, that labelled the
197-
`Person` dimension of a DataSet of Person x Score x Time
198-
215+
_label_ the data variables, so you could have a coordinate Age, that labelled
216+
the Person dimension of a Dataset of Person x Score x Time.
199217

200218
While xarray may take some getting used to, it's worth it! If anything is unclear,
201219
please post an issue on `GitHub <https://github.com/pydata/xarray>`__ or
202220
`StackOverflow <http://stackoverflow.com/questions/tagged/python-xarray>`__,
203-
and we'll endeavor to respond to the specific case or improve the general docs.
221+
and we'll endeavor to respond to the specific case or improve the general docs.

doc/whats-new.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@ v0.8.2 (unreleased)
2121
Enhancements
2222
~~~~~~~~~~~~
2323

24+
- New documentation on :ref:`panel transition`. By
25+
`Maximilian Roos <https://github.com/MaximilianR>`_.
26+
2427
Bug fixes
2528
~~~~~~~~~
2629

0 commit comments

Comments
 (0)