Skip to content

Referendum on Histogram format #1776

@jmacd

Description

@jmacd

What are you trying to achieve?

This issue aims to achieve consensus from OpenTelemetry maintainers and approvers over a choice of histogram formats.

There has been a lengthy process to standardize an exponential histogram protocol for OpenTelemetry, culminating in OTEP 149 which arrived at a consensus on exponential histograms.

Contemporaneously, the OpenHistogram code base was re-licensed (formerly the Circonus Log-linear Histogram), making it an appealing option for both OpenTelemetry (and OpenMetrics) to use, but it is not the same as an exponential histogram. The goal of this issue is to summarize these options to allow a wider audience to participate in the selection process.

It is important that we choose only one of these options for several reasons: (1) because there is a loss of fidelity whenever we translate between incompatible histogram types, (2) because there will be a large amount of code dedicated to handling each of these types in many downstream vendor and OSS systems.

OTEP 149: Exponential histograms

OTEP 148, discussed and debated in open-telemetry/oteps#149, lists considerations around specifying an exponential histogram for OTLP. In an exponential histogram, boundaries are located at integer powers of a specific base value. If the base value is 1.1, then there are boundaries at 1.1, 1.1^2, 1.1^3, 1.1^4, and so on.

Merging exponential histograms

While this form of histogram is almost trivial in concept, there was an early concern about loss of fidelity when merging arbitrary histograms, due to residual "artifacts". This consideration leads in two directions, both of which have been explored.

The first avenue is to fix the histogram parameters completely--when there are no parameters then you can merge histograms without loss of fidelity. The second avenue is to choose a parameterization scheme that has "perfect subsetting", which is when buckets of a high-resolution histogram perfectly map into buckets of a lower-resolution histogram.

Perfect subsetting for exponential histograms

Following UDDSketch and unpublished work at Google, OTEP 149 lands on the idea of a powers-of-2 exponential histogram, one that ensures perfect subsetting.

scale boundaries, e.g. number of buckets spanning 1..100
0 1, 2, 4, 8 7
1 1, 1.4, 2, 2.8, 4 14
2 1, 1.2, 1.4, 1.7, 2, 27
3 1, 1.09, 1.19, ... 1.7, 1.83, 2 54
4 1, 1.04, 1.09, ... 1.83, 1.92, 2 107
5 1, 1.02, 1.04, ... 1.92, 1.96, 213
6 1, 1.01, 1.02, ... 1.92, 1.96, 426

OpenHistogram: Log-linear histograms

OpenHistogram uses a base-10 exponential scheme with 90 buckets linearly dividing each "decade", so for example there are 90 buckets between 0.1 and 1, 90 buckets between 1 and 10, 90 buckets between 10 and 100, and so on. This approach maps well to human intuition about logarithmic scale. This approach is also tailored for environments without floating point processors.

Merging OpenHistogram histograms

OpenHistogram has fixed parameterization, thus avoids loss of fidelity when merging.

Perfect subsetting for OpenHistogram-like histograms

The goal of perfect subsetting is to select parameters that support merging of high- and low-resolution histograms. OpenHistogram makes a strong case for the use of 90 buckets per decade, which is relatively high resolution, because any factor of 9 ensures integer boundaries >= 1 in a base-10 scheme.

Resolution factors that are compatible with OpenHistogram and have perfect subsetting:

resolution boundaries, e.g. number of buckets spanning 1..100
1 1, 10 2
3 1, 4, 7, 10 6
9 1, 2, 3, ... 8, 9, 10 18
18 1, 1.5, 2, ... 9, 9.5, 10 36
90 1, 1.1, 1.2, ... 9.8, 9.9, 10 180
180 1, 1.05, 1.1, ... 9.9, 9.95, 10 360

Note that while OpenHistogram fixes resolution at 90, OpenTelemetry could adopt a protocol with support for other resolutions and still adopt OpenHistogram libraries for its SDKs, allowing metrics processing pipelines to automatically lower resolution for collected metrics.

Protocol considerations

Regardless of which choice is made for the options above above, there are several choices remaining.

Sparse vs. Dense encoding

A dense encoding is optimized for the case where non-empty buckets will be clustered together, making it efficient to encode a single offset and one (if exponential) or two (if log-linear) arrays of counts.

A sparse encoding is optimized for the case where non-empty buckets are not clustered together, making it efficient to encode every bucket index separately.

Sparse histograms can be converted into denser, lower-resolution histograms by perfect subsetting; bucket indexes are are sometimes compressed using delta-encoding techniques.

Zero bucket handling

The zero value must be handled specially in an exponential histogram. There is a question about how to recognize values that are close to zero, whether they "fall into" the zero bucket for example.

Converting from other histogram formats

Both the exponential and log-linear family of histogram are expected to improve the resolution-per-byte of encoded data that is achieved, relative to the explicit-boundary histogram currently included in OTLP. Metrics processors that operate on this data will require helper methods to assist in translating from other histogram formats into the exponential histogram that we choose.

To translate from another histogram format, often we use interpolation. Rules for interpolating between histogram buckets should specify how to handle the buckets that span zero and buckets with boundaries at infinity, since both are valid configurations. To interpolate from arbitrary-boundary buckets, we have to calculate bucket indexes for each boundary in the input.

Calculating bucket indexes: Exponential case

To calculate the bucket index for an arbitrary exponential base generally means calculating a logarithm of the value.

For the special case of base-2 exponential histograms, the IEEE 754 floating point representation has the data in the correct format, it can be directly extracted using bitwise operations.

To calculate bucket indexes without floating point hardware, a recursion relationship can be used.

Calculating bucket indexes: Log-linear case

OpenHistogram defines a recursion relation for calculating the bucket index.

Summary

Thank you to the experts that have guided us to this point. @yzhuge, @postwait, @oertl, @githomin, @CharlesMasson, @HeinrichHartmann, and @jdmontana.

This is now a question for the community. There are two major options presented here, a base-2 exponential histogram and a base-10 log-linear histogram. Both have technical merits.

There are also non-technical merits to consider. OpenHistogram is readily available and has already been adopted in a number of OSS systems, such as Envoy. An out-of-the-box Prometheus client uses 12 buckets with default boundaries that map exactly onto OpenHistogram boundaries, which cannot be said for the binary-exponential approach.

My personal opinion (@jmacd): I am in favor of adopting a protocol with support for OpenHistogram's histogram as long as the protocol supports the lower resolution factors listed above, 3, 9, 18, which appears to be a trivial extension to the OpenHistogram model. I am assuming this approach will be legally acceptable as far as the OpenHistogram project is concerned.

Metadata

Metadata

Labels

spec:metricsRelated to the specification/metrics directoryspec:protocolRelated to the specification/protocol directory

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions