Skip to content

ENH/CLN: add BoxPlot class inheriting MPLPlot #7998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 25, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/v0.15.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ API changes
as the ``left`` argument. (:issue:`7737`)

- Histogram from ``DataFrame.plot`` with ``kind='hist'`` (:issue:`7809`), See :ref:`the docs<visualization.hist>`.
- Boxplot from ``DataFrame.plot`` with ``kind='box'`` (:issue:`7998`), See :ref:`the docs<visualization.box>`.
- Consistency when indexing with ``.loc`` and a list-like indexer when no values are found.

.. ipython:: python
Expand Down
77 changes: 68 additions & 9 deletions doc/source/visualization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ These include:

* :ref:`'bar' <visualization.barplot>` or :ref:`'barh' <visualization.barplot>` for bar plots
* :ref:`'hist' <visualization.hist>` for histogram
* :ref:`'box' <visualization.box>` for boxplot
* :ref:`'kde' <visualization.kde>` or ``'density'`` for density plots
* :ref:`'area' <visualization.area_plot>` for area plots
* :ref:`'scatter' <visualization.scatter>` for scatter plots
Expand Down Expand Up @@ -244,7 +245,7 @@ See the :meth:`hist <matplotlib.axes.Axes.hist>` method and the
`matplotlib hist documenation <http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist>`__ for more.


The previous interface ``DataFrame.hist`` to plot histogram still can be used.
The existing interface ``DataFrame.hist`` to plot histogram still can be used.

.. ipython:: python

Expand Down Expand Up @@ -288,12 +289,65 @@ The ``by`` keyword can be specified to plot grouped histograms:
Box Plots
~~~~~~~~~

DataFrame has a :meth:`~DataFrame.boxplot` method that allows you to visualize the
distribution of values within each column.
Boxplot can be drawn calling a ``Series`` and ``DataFrame.plot`` with ``kind='box'``,
or ``DataFrame.boxplot`` to visualize the distribution of values within each column.

.. versionadded:: 0.15.0

``plot`` method now supports ``kind='box'`` to draw boxplot.

For instance, here is a boxplot representing five trials of 10 observations of
a uniform random variable on [0,1).

.. ipython:: python
:suppress:

np.random.seed(123456)

.. ipython:: python

df = DataFrame(rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])

@savefig box_plot_new.png
df.plot(kind='box')

Boxplot can be colorized by passing ``color`` keyword. You can pass a ``dict``
whose keys are ``boxes``, ``whiskers``, ``medians`` and ``caps``.
If some keys are missing in the ``dict``, default colors are used
for the corresponding artists. Also, boxplot has ``sym`` keyword to specify fliers style.

When you pass other type of arguments via ``color`` keyword, it will be directly
passed to matplotlib for all the ``boxes``, ``whiskers``, ``medians`` and ``caps``
colorization.

The colors are applied to every boxes to be drawn. If you want
more complicated colorization, you can get each drawn artists by passing
:ref:`return_type <visualization.box.return>`.

.. ipython:: python

color = dict(boxes='DarkGreen', whiskers='DarkOrange',
medians='DarkBlue', caps='Gray')

@savefig box_new_colorize.png
df.plot(kind='box', color=color, sym='r+')

Also, you can pass other keywords supported by matplotlib ``boxplot``.
For example, horizontal and custom-positioned boxplot can be drawn by
``vert=False`` and ``positions`` keywords.

.. ipython:: python

@savefig box_new_kwargs.png
df.plot(kind='box', vert=False, positions=[1, 4, 5, 6, 8])


See the :meth:`boxplot <matplotlib.axes.Axes.boxplot>` method and the
`matplotlib boxplot documenation <http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.boxplot>`__ for more.


The existing interface ``DataFrame.boxplot`` to plot boxplot still can be used.

.. ipython:: python
:suppress:

Expand Down Expand Up @@ -354,18 +408,23 @@ columns:

.. _visualization.box.return:

The return type of ``boxplot`` depends on two keyword arguments: ``by`` and ``return_type``.
When ``by`` is ``None``:
Basically, plot functions return :class:`matplotlib Axes <matplotlib.axes.Axes>` as a return value.
In ``boxplot``, the return type can be changed by argument ``return_type``, and whether the subplots is enabled (``subplots=True`` in ``plot`` or ``by`` is specified in ``boxplot``).

When ``subplots=False`` / ``by`` is ``None``:

* if ``return_type`` is ``'dict'``, a dictionary containing the :class:`matplotlib Lines <matplotlib.lines.Line2D>` is returned. The keys are "boxes", "caps", "fliers", "medians", and "whiskers".
This is the default.
This is the default of ``boxplot`` in historical reason.
Note that ``plot(kind='box')`` returns ``Axes`` as default as the same as other plots.
* if ``return_type`` is ``'axes'``, a :class:`matplotlib Axes <matplotlib.axes.Axes>` containing the boxplot is returned.
* if ``return_type`` is ``'both'`` a namedtuple containging the :class:`matplotlib Axes <matplotlib.axes.Axes>`
and :class:`matplotlib Lines <matplotlib.lines.Line2D>` is returned

When ``by`` is some column of the DataFrame, a dict of ``return_type`` is returned, where
the keys are the columns of the DataFrame. The plot has a facet for each column of
the DataFrame, with a separate box for each value of ``by``.
When ``subplots=True`` / ``by`` is some column of the DataFrame:

* A dict of ``return_type`` is returned, where the keys are the columns
of the DataFrame. The plot has a facet for each column of
the DataFrame, with a separate box for each value of ``by``.

Finally, when calling boxplot on a :class:`Groupby` object, a dict of ``return_type``
is returned, where the keys are the same as the Groupby object. The plot has a
Expand Down
177 changes: 171 additions & 6 deletions pandas/tests/test_graphics.py
Original file line number Diff line number Diff line change
Expand Up @@ -365,7 +365,8 @@ def _check_has_errorbars(self, axes, xerr=0, yerr=0):
self.assertEqual(xerr, xerr_count)
self.assertEqual(yerr, yerr_count)

def _check_box_return_type(self, returned, return_type, expected_keys=None):
def _check_box_return_type(self, returned, return_type, expected_keys=None,
check_ax_title=True):
"""
Check box returned type is correct

Expand All @@ -377,6 +378,10 @@ def _check_box_return_type(self, returned, return_type, expected_keys=None):
expected_keys : list-like, optional
group labels in subplot case. If not passed,
the function checks assuming boxplot uses single ax
check_ax_title : bool
Whether to check the ax.title is the same as expected_key
Intended to be checked by calling from ``boxplot``.
Normal ``plot`` doesn't attach ``ax.title``, it must be disabled.
"""
from matplotlib.axes import Axes
types = {'dict': dict, 'axes': Axes, 'both': tuple}
Expand All @@ -402,14 +407,17 @@ def _check_box_return_type(self, returned, return_type, expected_keys=None):
self.assertTrue(isinstance(value, types[return_type]))
# check returned dict has correct mapping
if return_type == 'axes':
self.assertEqual(value.get_title(), key)
if check_ax_title:
self.assertEqual(value.get_title(), key)
elif return_type == 'both':
self.assertEqual(value.ax.get_title(), key)
if check_ax_title:
self.assertEqual(value.ax.get_title(), key)
self.assertIsInstance(value.ax, Axes)
self.assertIsInstance(value.lines, dict)
elif return_type == 'dict':
line = value['medians'][0]
self.assertEqual(line.get_axes().get_title(), key)
if check_ax_title:
self.assertEqual(line.get_axes().get_title(), key)
else:
raise AssertionError

Expand Down Expand Up @@ -452,7 +460,7 @@ def test_plot(self):
_check_plot_works(self.ts.plot, kind='area', stacked=False)
_check_plot_works(self.iseries.plot)

for kind in ['line', 'bar', 'barh', 'kde', 'hist']:
for kind in ['line', 'bar', 'barh', 'kde', 'hist', 'box']:
if not _ok_for_gaussian_kde(kind):
continue
_check_plot_works(self.series[:5].plot, kind=kind)
Expand Down Expand Up @@ -767,6 +775,15 @@ def test_hist_kde_color(self):
self.assertEqual(len(lines), 1)
self._check_colors(lines, ['r'])

@slow
def test_boxplot_series(self):
ax = self.ts.plot(kind='box', logy=True)
self._check_ax_scales(ax, yaxis='log')
xlabels = ax.get_xticklabels()
self._check_text_labels(xlabels, [self.ts.name])
ylabels = ax.get_yticklabels()
self._check_text_labels(ylabels, [''] * len(ylabels))

@slow
def test_autocorrelation_plot(self):
from pandas.tools.plotting import autocorrelation_plot
Expand Down Expand Up @@ -1650,6 +1667,99 @@ def test_bar_log_subplots(self):

@slow
def test_boxplot(self):
df = self.hist_df
series = df['height']
numeric_cols = df._get_numeric_data().columns
labels = [com.pprint_thing(c) for c in numeric_cols]

ax = _check_plot_works(df.plot, kind='box')
self._check_text_labels(ax.get_xticklabels(), labels)
assert_array_equal(ax.xaxis.get_ticklocs(), np.arange(1, len(numeric_cols) + 1))
self.assertEqual(len(ax.lines), 8 * len(numeric_cols))

axes = _check_plot_works(df.plot, kind='box', subplots=True, logy=True)
self._check_axes_shape(axes, axes_num=3, layout=(1, 3))
self._check_ax_scales(axes, yaxis='log')
for ax, label in zip(axes, labels):
self._check_text_labels(ax.get_xticklabels(), [label])
self.assertEqual(len(ax.lines), 8)

axes = series.plot(kind='box', rot=40)
self._check_ticks_props(axes, xrot=40, yrot=0)
tm.close()

ax = _check_plot_works(series.plot, kind='box')

positions = np.array([1, 6, 7])
ax = df.plot(kind='box', positions=positions)
numeric_cols = df._get_numeric_data().columns
labels = [com.pprint_thing(c) for c in numeric_cols]
self._check_text_labels(ax.get_xticklabels(), labels)
assert_array_equal(ax.xaxis.get_ticklocs(), positions)
self.assertEqual(len(ax.lines), 8 * len(numeric_cols))

@slow
def test_boxplot_vertical(self):
df = self.hist_df
series = df['height']
numeric_cols = df._get_numeric_data().columns
labels = [com.pprint_thing(c) for c in numeric_cols]

# if horizontal, yticklabels are rotated
ax = df.plot(kind='box', rot=50, fontsize=8, vert=False)
self._check_ticks_props(ax, xrot=0, yrot=50, ylabelsize=8)
self._check_text_labels(ax.get_yticklabels(), labels)
self.assertEqual(len(ax.lines), 8 * len(numeric_cols))

axes = _check_plot_works(df.plot, kind='box', subplots=True,
vert=False, logx=True)
self._check_axes_shape(axes, axes_num=3, layout=(1, 3))
self._check_ax_scales(axes, xaxis='log')
for ax, label in zip(axes, labels):
self._check_text_labels(ax.get_yticklabels(), [label])
self.assertEqual(len(ax.lines), 8)

positions = np.array([3, 2, 8])
ax = df.plot(kind='box', positions=positions, vert=False)
self._check_text_labels(ax.get_yticklabels(), labels)
assert_array_equal(ax.yaxis.get_ticklocs(), positions)
self.assertEqual(len(ax.lines), 8 * len(numeric_cols))

@slow
def test_boxplot_return_type(self):
df = DataFrame(randn(6, 4),
index=list(string.ascii_letters[:6]),
columns=['one', 'two', 'three', 'four'])
with tm.assertRaises(ValueError):
df.plot(kind='box', return_type='NOTATYPE')

result = df.plot(kind='box', return_type='dict')
self._check_box_return_type(result, 'dict')

result = df.plot(kind='box', return_type='axes')
self._check_box_return_type(result, 'axes')

result = df.plot(kind='box', return_type='both')
self._check_box_return_type(result, 'both')

@slow
def test_boxplot_subplots_return_type(self):
df = self.hist_df

# normal style: return_type=None
result = df.plot(kind='box', subplots=True)
self.assertIsInstance(result, np.ndarray)
self._check_box_return_type(result, None,
expected_keys=['height', 'weight', 'category'])

for t in ['dict', 'axes', 'both']:
returned = df.plot(kind='box', return_type=t, subplots=True)
self._check_box_return_type(returned, t,
expected_keys=['height', 'weight', 'category'],
check_ax_title=False)

@slow
def test_boxplot_legacy(self):
df = DataFrame(randn(6, 4),
index=list(string.ascii_letters[:6]),
columns=['one', 'two', 'three', 'four'])
Expand Down Expand Up @@ -1693,7 +1803,7 @@ def test_boxplot(self):
self.assertEqual(len(ax.get_lines()), len(lines))

@slow
def test_boxplot_return_type(self):
def test_boxplot_return_type_legacy(self):
# API change in https://github.com/pydata/pandas/pull/7096
import matplotlib as mpl

Expand Down Expand Up @@ -2315,6 +2425,61 @@ def test_kde_colors(self):
rgba_colors = lmap(cm.jet, np.linspace(0, 1, len(df)))
self._check_colors(ax.get_lines(), linecolors=rgba_colors)

@slow
def test_boxplot_colors(self):

def _check_colors(bp, box_c, whiskers_c, medians_c, caps_c='k', fliers_c='b'):
self._check_colors(bp['boxes'], linecolors=[box_c] * len(bp['boxes']))
self._check_colors(bp['whiskers'], linecolors=[whiskers_c] * len(bp['whiskers']))
self._check_colors(bp['medians'], linecolors=[medians_c] * len(bp['medians']))
self._check_colors(bp['fliers'], linecolors=[fliers_c] * len(bp['fliers']))
self._check_colors(bp['caps'], linecolors=[caps_c] * len(bp['caps']))

default_colors = self.plt.rcParams.get('axes.color_cycle')

df = DataFrame(randn(5, 5))
bp = df.plot(kind='box', return_type='dict')
_check_colors(bp, default_colors[0], default_colors[0], default_colors[2])
tm.close()

dict_colors = dict(boxes='#572923', whiskers='#982042',
medians='#804823', caps='#123456')
bp = df.plot(kind='box', color=dict_colors, sym='r+', return_type='dict')
_check_colors(bp, dict_colors['boxes'], dict_colors['whiskers'],
dict_colors['medians'], dict_colors['caps'], 'r')
tm.close()

# partial colors
dict_colors = dict(whiskers='c', medians='m')
bp = df.plot(kind='box', color=dict_colors, return_type='dict')
_check_colors(bp, default_colors[0], 'c', 'm')
tm.close()

from matplotlib import cm
# Test str -> colormap functionality
bp = df.plot(kind='box', colormap='jet', return_type='dict')
jet_colors = lmap(cm.jet, np.linspace(0, 1, 3))
_check_colors(bp, jet_colors[0], jet_colors[0], jet_colors[2])
tm.close()

# Test colormap functionality
bp = df.plot(kind='box', colormap=cm.jet, return_type='dict')
_check_colors(bp, jet_colors[0], jet_colors[0], jet_colors[2])
tm.close()

# string color is applied to all artists except fliers
bp = df.plot(kind='box', color='DodgerBlue', return_type='dict')
_check_colors(bp, 'DodgerBlue', 'DodgerBlue', 'DodgerBlue',
'DodgerBlue')

# tuple is also applied to all artists except fliers
bp = df.plot(kind='box', color=(0, 1, 0), sym='#123456', return_type='dict')
_check_colors(bp, (0, 1, 0), (0, 1, 0), (0, 1, 0), (0, 1, 0), '#123456')

with tm.assertRaises(ValueError):
# Color contains invalid key results in ValueError
df.plot(kind='box', color=dict(boxes='red', xxxx='blue'))

def test_default_color_cycle(self):
import matplotlib.pyplot as plt
plt.rcParams['axes.color_cycle'] = list('rgbk')
Expand Down
Loading