Skip to content

Commit 50f6473

Browse files
MaximilianRshoyer
authored andcommitted
WIP for transitioning from Panel docs (#832)
1 parent 7d7673c commit 50f6473

File tree

1 file changed

+76
-0
lines changed

1 file changed

+76
-0
lines changed

doc/pandas.rst

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,3 +125,79 @@ xarray objects do not yet support hierarchical indexes, so if your data has
125125
a hierarchical index, you will either need to unstack it first or use the
126126
:py:meth:`~xarray.DataArray.from_series` or
127127
:py:meth:`~xarray.Dataset.from_dataframe` constructors described above.
128+
129+
130+
Transitioning from pandas.Panel to xarray
131+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
132+
133+
:py:class:`~pandas.Panel`, pandas's data structure for 3D arrays, has always been a second class
134+
data structure compared to the Series and DataFrame. To allow pandas developers to focus more on
135+
its core functionality built around the DataFrame, pandas plans to eventually deprecate Panel.
136+
137+
xarray has most of ``Panel``'s features, a more explicit API (particularly around
138+
indexing), and the ability to scale to >3 dimensions with the same interface.
139+
140+
As discussed in the xarray docs, there are two primary data structures in xarray:
141+
``DataArray`` and ``Dataset``. You can imagine a ``DataArray`` as a n-dimensional pandas
142+
``Series`` (i.e. a single typed array), and a ``Dataset`` as the ``DataFrame``-equivalent
143+
(i.e. a dict of aligned ``DataArray``s).
144+
145+
So you can represent a Panel, in two ways:
146+
- A 3-dimenional ``DataArray``
147+
- A ``Dataset`` containing a number of 2-dimensional DataArray-s
148+
149+
.. ipython:: python
150+
panel = pd.Panel(np.random.rand(2, 3, 4), items=list('ab'), major_axis=list('mno'),
151+
minor_axis=pd.date_range(start='2000', periods=4, name='date'))
152+
153+
panel
154+
155+
156+
As a DataArray:
157+
158+
159+
.. ipython:: python
160+
161+
xr.DataArray(panel)
162+
163+
Or:
164+
165+
166+
.. ipython:: python
167+
168+
panel.to_xarray()
169+
170+
171+
As you can see, there are three dimensions (each is also a coordinate). Two of the
172+
axes of the panel were unnamed, so have been assigned `dim_0` & `dim_1` respectively,
173+
while the third retains its name `date`.
174+
175+
176+
As a Dataset:
177+
178+
.. ipython:: python
179+
xr.Dataset(panel)
180+
181+
Here, there are two data variables, each representing a DataFrame on panel's `items`
182+
axis, and labelled as such. Each variable is a 2D array of the respective values along
183+
the `items` dimension.
184+
185+
While the xarray docs are relatively complete, a few items stand out for Panel users:
186+
- A DataArray's data is stored as a numpy array, and so can only contain a single
187+
type. As a result, a Panel that contains :py:class:`~pandas.DataFrame`s with
188+
multiple types will be converted to `object` types. A ``Dataset`` of multiple ``DataArray``s
189+
each with its own dtype will allow original types to be preserved
190+
- Indexing is similar to pandas, but more explicit and leverages xarray's naming
191+
of dimensions
192+
- Because of those features, making much higher dimension-ed data is very practical
193+
- Variables in ``Dataset``s can use a subset of its dimensions. For example, you can
194+
have one dataset with Person x Score x Time, and another with Person x Score
195+
- You can use coordinates are used for both dimensions and for variables which
196+
_label_ the data variables, so you could have a coordinate Age, that labelled the
197+
`Person` dimension of a DataSet of Person x Score x Time
198+
199+
200+
While xarray may take some getting used to, it's worth it! If anything is unclear,
201+
please post an issue on `GitHub <https://github.com/pydata/xarray>`__ or
202+
`StackOverflow <http://stackoverflow.com/questions/tagged/python-xarray>`__,
203+
and we'll endeavor to respond to the specific case or improve the general docs.

0 commit comments

Comments
 (0)