Wrong setting of histnorm in the method  make_hist, FigureFactory.create_distplot

`FigureFactory.create_distplot` is intended to compare the  histogram of a data set with  the kde estimation of the probability density function, and more. But Plotly Histogram can create  5 types of histograms, each one set via the key `histnorm`.
 The method `create_hist` has a drawback. It does not choose the right value for the histnorm key.  Its default  value is `histnorm='probability'`:
[https://github.com/plotly/plotly.py/blob/master/plotly/tools.py#L5084](https://github.com/plotly/plotly.py/blob/master/plotly/tools.py#L5084),  and this  is contrary to the theoretical definition of this kind of histogram, and that of the  probability density function (pdf).

When `histnorm='probability'`, the height  of a bar in histogram equals the probability that data fall within the corresponding bin.  Comparing the kde estimation of the pdf with such a histogram means that we admit that the pdf takes only values in [0,1], and this  is not right.

 In a histogram plotted with histnorm='probability density', the height `h`   of a bar is such that h*bin_size=probability that data fall in that bin=bar area. The bar area approximates the area under the pdf graph above that bin.
Hence the probability density function (pdf) or its kde estimation should be compared to the Plotly Histogram corresponding to `histnorm='probability density'`.
I illustrate below the  plot of the pdf of  Beta distribution over the two types of histograms, to point out the drawback of the `make_hist` method.
![beta_probability](https://cloud.githubusercontent.com/assets/3627253/15268520/7694b5a8-19ea-11e6-88cc-8c7f022f13f4.png)
![beta_probability_density](https://cloud.githubusercontent.com/assets/3627253/15268521/7b43e4c0-19ea-11e6-9911-4f6f4eac030f.png)

 See also in [https://plot.ly/~chelsea_lyn/11601/group-1-group-2-group-3-group-4-group-1-group-2-group-3-group-4-group-1-group-2-/](https://plot.ly/~chelsea_lyn/11601/group-1-group-2-group-3-group-4-group-1-group-2-group-3-group-4-group-1-group-2-/)  how distant from the corresponding histograms are the  last two pdfs.
Here [http://nbviewer.ipython.org/0f42b607de8f0d0c50b0ffb0ccfdff08](http://nbviewer.ipython.org/0f42b607de8f0d0c50b0ffb0ccfdff08)  is the updated plot with `histnorm='probability density'`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Wrong setting of histnorm in the method make_hist, FigureFactory.create_distplot #459

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Wrong setting of histnorm in the method make_hist, FigureFactory.create_distplot #459

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions