Skip to content

BUG: categorical/string Series hist() method produces confusing bar plot. #22091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nmusolino opened this issue Jul 28, 2018 · 2 comments
Closed
Labels
Bug Duplicate Report Duplicate issue or pull request Visualization plotting

Comments

@nmusolino
Copy link
Contributor

nmusolino commented Jul 28, 2018

Code Sample

In [1]: import pandas

In [5]:  s = pandas.Series(list('AAAABBC'))

In [6]: s
Out[6]: 
0    A
1    A
2    A
3    A
4    B
5    B
6    C
dtype: object

In [7]: ax = s.hist()

In [8]: ax.figure.savefig('actual.png')

Problem description

The hist() method creates a plot with multiple colored bars at each label on the x-axis.

actual

This kind of plot is not informative and should not be produced.

The same result is obtained using s.astype('category').

Expected Output

Either a plot of value counts (see below), OR, alternatively, an exception could be raised.

In [9]: ax = s.value_counts().plot.bar(color='darkblue')

In [10]: ax.figure.savefig('expected.png')

expected

Note that DataFrame.hist() omits object columns, and s.to_frame('i').hist() raises an exception.

Discussion

This is essentially the same question as discussed in #8712 (November 2014). In that issue, the proposal is to create a plot instead of raising an error. This issue is that producing a bad plot is a bug.

cc @jreback , reporter of #8712

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 114f415
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+365.g114f41534.dirty
pytest: 3.6.3
pip: 10.0.1
setuptools: 39.0.1
Cython: 0.28.4
numpy: 1.15.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.6
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@gfyoung gfyoung added the Visualization plotting label Jul 30, 2018
@gfyoung
Copy link
Member

gfyoung commented Jul 30, 2018

@nmusolino : So the issue in itself is the same as the one in #8712? In that case, it would be better to express your opinion for the solution in the original issue.

@gfyoung gfyoung added the Bug label Jul 30, 2018
@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label Jul 30, 2018
@TomAugspurger TomAugspurger added this to the No action milestone Jul 30, 2018
@TomAugspurger
Copy link
Contributor

Yep, let's keep this concentrated in #8712.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Visualization plotting
Projects
None yet
Development

No branches or pull requests

3 participants