Skip to content

Wrong setting of histnorm in the method make_hist, FigureFactory.create_distplot #459

@empet

Description

@empet

FigureFactory.create_distplot is intended to compare the histogram of a data set with the kde estimation of the probability density function, and more. But Plotly Histogram can create 5 types of histograms, each one set via the key histnorm.
The method create_hist has a drawback. It does not choose the right value for the histnorm key. Its default value is histnorm='probability':
https://github.com/plotly/plotly.py/blob/master/plotly/tools.py#L5084, and this is contrary to the theoretical definition of this kind of histogram, and that of the probability density function (pdf).

When histnorm='probability', the height of a bar in histogram equals the probability that data fall within the corresponding bin. Comparing the kde estimation of the pdf with such a histogram means that we admit that the pdf takes only values in [0,1], and this is not right.

In a histogram plotted with histnorm='probability density', the height h of a bar is such that h*bin_size=probability that data fall in that bin=bar area. The bar area approximates the area under the pdf graph above that bin.
Hence the probability density function (pdf) or its kde estimation should be compared to the Plotly Histogram corresponding to histnorm='probability density'.
I illustrate below the plot of the pdf of Beta distribution over the two types of histograms, to point out the drawback of the make_hist method.
beta_probability
beta_probability_density

See also in https://plot.ly/~chelsea_lyn/11601/group-1-group-2-group-3-group-4-group-1-group-2-group-3-group-4-group-1-group-2-/ how distant from the corresponding histograms are the last two pdfs.
Here http://nbviewer.ipython.org/0f42b607de8f0d0c50b0ffb0ccfdff08 is the updated plot with histnorm='probability density'.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions