Referendum on Histogram format

**What are you trying to achieve?**

This issue aims to achieve consensus from OpenTelemetry maintainers and approvers over a choice of histogram formats. 

There has been a lengthy process to standardize an exponential histogram protocol for OpenTelemetry, culminating in [OTEP 149](https://github.com/open-telemetry/oteps/blob/main/text/0149-exponential-histogram.md) which arrived at a consensus on exponential histograms.

Contemporaneously, the OpenHistogram code base was re-licensed (formerly the Circonus Log-linear Histogram), making it an appealing option for both OpenTelemetry (and OpenMetrics) to use, but it is not the same as an exponential histogram. The goal of this issue is to summarize these options to allow a wider audience to participate in the selection process.

It is important that we choose only one of these options for several reasons: (1) because there is a loss of fidelity whenever we translate between incompatible histogram types, (2) because there will be a large amount of code dedicated to handling each of these types in many downstream vendor and OSS systems.

# OTEP 149: Exponential histograms

OTEP 148, discussed and debated in https://github.com/open-telemetry/oteps/pull/149, lists considerations around specifying an exponential histogram for OTLP. In an exponential histogram, boundaries are located at integer powers of a specific base value. If the base value is 1.1, then there are boundaries at 1.1, 1.1^2, 1.1^3, 1.1^4, and so on.

## Merging exponential histograms

While this form of histogram is almost trivial in concept, there was an early concern about loss of fidelity when merging arbitrary histograms, due to residual "artifacts". This consideration leads in two directions, both of which have been explored.

The first avenue is to fix the histogram parameters completely--when there are no parameters then you can merge histograms without loss of fidelity. The second avenue is to choose a parameterization scheme that has "perfect subsetting", which is when buckets of a high-resolution histogram perfectly map into buckets of a lower-resolution histogram.

## Perfect subsetting for exponential histograms

Following [UDDSketch](https://arxiv.org/abs/2004.08604) and unpublished work at Google, OTEP 149 lands on the idea of a powers-of-2 exponential histogram, one that ensures perfect subsetting.

| scale | boundaries, e.g. | number of buckets spanning 1..100 |
| ----- | ----- | ---- |
| 0 | 1, 2, 4, 8 | 7 |
| 1 | 1, 1.4, 2, 2.8, 4| 14 |
| 2 | 1, 1.2, 1.4, 1.7, 2,  | 27 |
| 3 | 1, 1.09, 1.19, ... 1.7, 1.83, 2 | 54 |
| 4 | 1, 1.04, 1.09, ... 1.83, 1.92, 2 | 107 |
| 5 | 1, 1.02, 1.04, ... 1.92, 1.96, | 213 |
| 6 | 1, 1.01, 1.02, ... 1.92, 1.96, | 426 |

# OpenHistogram: Log-linear histograms

OpenHistogram uses a base-10 exponential scheme with 90 buckets linearly dividing each "decade", so for example there are 90 buckets between 0.1 and 1, 90 buckets between 1 and 10, 90 buckets between 10 and 100, and so on. This approach maps well to human intuition about logarithmic scale. This approach is also tailored for environments without floating point processors.

## Merging OpenHistogram histograms

OpenHistogram has fixed parameterization, thus avoids loss of fidelity when merging.

## Perfect subsetting for OpenHistogram-like histograms

The goal of perfect subsetting is to select parameters that support merging of high- and low-resolution histograms. OpenHistogram makes a strong case for the use of 90 buckets per decade, which is relatively high resolution, because any factor of 9 ensures integer boundaries >= 1 in a base-10 scheme. 

Resolution factors that are compatible with OpenHistogram and have perfect subsetting:

| resolution | boundaries, e.g. | number of buckets spanning 1..100 |
| ----- | ----- | --- |
| 1 | 1, 10 | 2 |
| 3 | 1, 4, 7, 10 | 6 |
| 9 | 1, 2, 3, ... 8, 9, 10 | 18 |
| 18 | 1, 1.5, 2, ... 9, 9.5, 10 | 36 |
| 90 | 1, 1.1, 1.2, ... 9.8, 9.9, 10 | 180 |
| 180 | 1, 1.05, 1.1, ... 9.9, 9.95, 10 | 360 |

Note that while OpenHistogram fixes resolution at 90, OpenTelemetry could adopt a protocol with support for other resolutions and still adopt OpenHistogram libraries for its SDKs, allowing metrics processing pipelines to automatically lower resolution for collected metrics.

# Protocol considerations

Regardless of which choice is made for the options above above, there are several choices remaining.

## Sparse vs. Dense encoding

A dense encoding is optimized for the case where non-empty buckets will be clustered together, making it efficient to encode a single offset and one (if exponential) or two (if log-linear) arrays of counts.

A sparse encoding is optimized for the case where non-empty buckets are not clustered together, making it efficient to encode every bucket index separately.

Sparse histograms can be converted into denser, lower-resolution histograms by perfect subsetting; bucket indexes are are sometimes compressed using delta-encoding techniques.

## Zero bucket handling

The zero value must be handled specially in an exponential histogram. There is a question about how to recognize values that are close to zero, whether they "fall into" the zero bucket for example.

## Converting from other histogram formats

Both the exponential and log-linear family of histogram are expected to improve the resolution-per-byte of encoded data that is achieved, relative to the explicit-boundary histogram currently included in OTLP. Metrics processors that operate on this data will require helper methods to assist in translating from other histogram formats into the exponential histogram that we choose.

To translate from another histogram format, often we use interpolation. Rules for interpolating between histogram buckets should specify how to handle the buckets that span zero and buckets with boundaries at infinity, since both are valid configurations. To interpolate from arbitrary-boundary buckets, we have to calculate bucket indexes for each boundary in the input.

### Calculating bucket indexes: Exponential case

To calculate the bucket index for an arbitrary exponential base generally means calculating a logarithm of the value.

For the special case of base-2 exponential histograms, the IEEE 754 floating point representation has the data in the correct format, it can be directly [extracted using bitwise operations](https://github.com/open-telemetry/oteps/pull/149#issuecomment-848526466).

To calculate bucket indexes without floating point hardware, a recursion relationship can be used.

### Calculating bucket indexes: Log-linear case

OpenHistogram defines a recursion relation for calculating the bucket index.

# Summary

Thank you to the experts that have guided us to this point. @yzhuge, @postwait, @oertl, @githomin, @CharlesMasson, @HeinrichHartmann, and @jdmontana.

This is now a question for the community. There are two major options presented here, a base-2 exponential histogram and a base-10 log-linear histogram. Both have technical merits.

There are also non-technical merits to consider. OpenHistogram is readily available and has already been adopted in a number of OSS systems, such as Envoy. An out-of-the-box Prometheus client uses 12 buckets with default boundaries that map exactly onto OpenHistogram boundaries, which cannot be said for the binary-exponential approach. 

My personal opinion (@jmacd): I am in favor of adopting a protocol with support for OpenHistogram's histogram as long as the protocol supports the lower resolution factors listed above, 3, 9, 18, which appears to be a trivial extension to the OpenHistogram model. I am assuming this approach will be legally acceptable as far as the OpenHistogram project is concerned.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Referendum on Histogram format #1776

OTEP 149: Exponential histograms

Merging exponential histograms

Perfect subsetting for exponential histograms

OpenHistogram: Log-linear histograms

Merging OpenHistogram histograms

Perfect subsetting for OpenHistogram-like histograms

Protocol considerations

Sparse vs. Dense encoding

Zero bucket handling

Converting from other histogram formats

Calculating bucket indexes: Exponential case

Calculating bucket indexes: Log-linear case

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

scale	boundaries, e.g.	number of buckets spanning 1..100
0	1, 2, 4, 8	7
1	1, 1.4, 2, 2.8, 4	14
2	1, 1.2, 1.4, 1.7, 2,	27
3	1, 1.09, 1.19, ... 1.7, 1.83, 2	54
4	1, 1.04, 1.09, ... 1.83, 1.92, 2	107
5	1, 1.02, 1.04, ... 1.92, 1.96,	213
6	1, 1.01, 1.02, ... 1.92, 1.96,	426

resolution	boundaries, e.g.	number of buckets spanning 1..100
1	1, 10	2
3	1, 4, 7, 10	6
9	1, 2, 3, ... 8, 9, 10	18
18	1, 1.5, 2, ... 9, 9.5, 10	36
90	1, 1.1, 1.2, ... 9.8, 9.9, 10	180
180	1, 1.05, 1.1, ... 9.9, 9.95, 10	360

Referendum on Histogram format #1776

Description

OTEP 149: Exponential histograms

Merging exponential histograms

Perfect subsetting for exponential histograms

OpenHistogram: Log-linear histograms

Merging OpenHistogram histograms

Perfect subsetting for OpenHistogram-like histograms

Protocol considerations

Sparse vs. Dense encoding

Zero bucket handling

Converting from other histogram formats

Calculating bucket indexes: Exponential case

Calculating bucket indexes: Log-linear case

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions