T digest field mapper #137546

not-napoleon · 2025-11-03T21:02:59Z

This PR adds a T-Digest specific field type to Elasticsearch. This allows for storing pre-computed percentile sketches using the t-digest algorithm, for later aggregation.

For this initial PR, it is mostly just a copy of the existing Histogram field mapper. I've added field level parameters for the t-digest implementation algorithm and compression parameters, with sensible defaults. This first iteration doesn't do any new input validation, so if a user sends a huge list of values, we'll just store it right now. In the future, we'd like to put some safeguards around that.

I'm labeling this >non-issue, as it's still behind a feature flag and thus we don't want it in the release notes. That said, it will eventually be an >enhancement, so there's no plan to back port it.

not-napoleon · 2025-11-05T18:30:45Z

libs/tdigest/src/test/java/org/elasticsearch/tdigest/BigCountTests.java

+    /**
+     * Verify that, at a range of compression values, the size of the produced digest is not much larger than 10 times the compression
+     */
+    public void testCompression() {


This is tangential to the main point of this PR, but part of the exploratory work and seems like a reasonable test in general.

not-napoleon · 2025-11-05T18:31:27Z

server/src/main/java/org/elasticsearch/search/aggregations/metrics/TDigestState.java


    // Supported tdigest types.
-    protected enum Type {
+    public enum Type {


Expanding the visibility here so we can use it as a parameter to the field.

not-napoleon · 2025-11-05T18:37:19Z

...gin/analytics/src/main/java/org/elasticsearch/xpack/analytics/mapper/TDigestFieldMapper.java

+
+    public static class Builder extends FieldMapper.Builder {
+        private static final int DEFAULT_COMPRESSION = 100;
+        private static final int MAXIMUM_COMPRESSION = 10000;


10000 is probably way too high, but it gives us good headroom. 1000 is already quite a lot of accuracy.

elasticsearchmachine · 2025-11-05T18:53:46Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

kkrik-es · 2025-11-06T07:42:37Z

...gin/analytics/src/main/java/org/elasticsearch/xpack/analytics/mapper/TDigestFieldMapper.java

+            );
+            this.compression = Parameter.intParam("compression", false, m -> toType(m).compression, DEFAULT_COMPRESSION).addValidator(c -> {
+                if (c <= 0 || c > MAXIMUM_COMPRESSION) {
+                    throw new IllegalArgumentException("compression must be a positive integer between 1 and " + MAXIMUM_COMPRESSION);


Nit: maybe list the passed value as well.

kkrik-es · 2025-11-06T07:49:15Z

...gin/analytics/src/main/java/org/elasticsearch/xpack/analytics/mapper/TDigestFieldMapper.java

+            b.startObject();
+
+            value.reset(binaryValue);
+            b.startArray("centroids");


centroids is def more accurate.. still, I wonder if we should use values here, for consistency with the existing histogram.

I prefer the accuracy in naming, personally. It's not like we expect users to be looking at the raw data most of the time, so this should be pretty hidden either way.

kkrik-es · 2025-11-06T07:50:15Z

...gin/analytics/src/main/java/org/elasticsearch/xpack/analytics/mapper/TDigestFieldMapper.java

+            b.endArray();
+
+            value.reset(binaryValue);
+            b.startArray("counts");


Nit: move centroids and counts to static constants that are also shared with the parser class.

kkrik-es · 2025-11-06T07:50:38Z

...k/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/mapper/TDigestParser.java

+    private static final ParseField VALUES_FIELD = new ParseField("centroids");
+
+    /**
+     * A parsed histogram field, can represent either a T-Digest or a HDR histogram.


No HDR here..

kkrik-es · 2025-11-06T07:50:56Z

...k/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/mapper/TDigestParser.java

+
+    /**
+     * Parses an XContent object into a histogram.
+     * The parse is expected to point at the next token after {@link XContentParser.Token#START_OBJECT}.


Suggested change

* The parse is expected to point at the next token after {@link XContentParser.Token#START_OBJECT}.

* The parser is expected to point at the next token after {@link XContentParser.Token#START_OBJECT}.

Looks like this typo was copy pasted from me, could you also fix it in ExponentialHistogramParser and HistogramParser while at it?

kkrik-es · 2025-11-06T07:52:37Z

...k/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/mapper/TDigestParser.java

+     * @param values the centroids, guaranteed to be distinct and in increasing order
+     * @param counts the counts, guaranteed to be non-negative and of the same length as values
+     */
+    public record ParsedHistogram(List<Double> values, List<Long> counts) {}


Let's be consistent with values vs centroids.

...nalytics/src/test/java/org/elasticsearch/xpack/analytics/mapper/TDigestFieldMapperTests.java

kkrik-es · 2025-11-06T07:56:30Z

...nalytics/src/test/java/org/elasticsearch/xpack/analytics/mapper/TDigestFieldMapperTests.java

+    @Override
+    protected Object getSampleValueForDocument() {
+        // TODO - In hybrid mode, this will not even build a t-digest. Let's test with bigger data
+        return Map.of("centroids", new double[] { 2, 3 }, "counts", new int[] { 0, 4 });


Add randomization?

SyntheticSourceExample below has logic for generating random histograms, can be moved to a util and reused.

I don't want to just use random histogram. Most of the tests should take actual, valid t-digests, and we can do specific tests for things that don't fit elsewhere. At any rate, I'd been planning to do this in a follow up PR (thus the todo), but I can do it now if you want.

kkrik-es

Nice, just a few nits.

JonasKunz · 2025-11-06T08:45:29Z

...k/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/mapper/TDigestParser.java

+public class TDigestParser {
+
+    private static final ParseField COUNTS_FIELD = new ParseField("counts");
+    private static final ParseField VALUES_FIELD = new ParseField("centroids");


If you are going for a different member-name on the former values field, you should probably also rename the constant:

Suggested change

private static final ParseField VALUES_FIELD = new ParseField("centroids");

private static final ParseField CENTROIDS_FIELD = new ParseField("centroids");

Co-authored-by: Kostas Krikellas <[email protected]>

…d-mapper' into t-digest-field-mapper

Add a T-Digest specific field type to Elasticsearch. This allows for storing pre-computed percentile sketches using the t-digest algorithm, for later aggregation. For this initial PR, it is mostly just a copy of the existing Histogram field mapper. I've added field level parameters for the t-digest implementation algorithm and compression parameters, with sensible defaults. This first iteration doesn't do any new input validation, so if a user sends a huge list of values, we'll just store it right now. In the future, we'd like to put some safeguards around that. Co-authored-by: Kostas Krikellas <[email protected]> --------- Co-authored-by: Kostas Krikellas <[email protected]>

not-napoleon added 4 commits October 30, 2025 14:32

initial fork of histogram mapper

bde49a6

fix field names in tests

3e3f316

tests for compression behavior

61526e7

better test failure message

3922a95

not-napoleon added >enhancement WIP :StorageEngine/Mapping The storage related side of mappings v9.3.0 labels Nov 3, 2025

not-napoleon added 4 commits November 5, 2025 11:02

add control parameters

ef6054a

update tests for the new parameters

bb80bda

feature flag

b7abc0b

remove nocommit

88165ac

not-napoleon commented Nov 5, 2025

View reviewed changes

remove nocommits

817e54d

not-napoleon commented Nov 5, 2025

View reviewed changes

missed a rename

46b09fc

not-napoleon added >non-issue and removed >enhancement WIP labels Nov 5, 2025

not-napoleon requested review from JonasKunz, kkrik-es and romseygeek November 5, 2025 18:53

not-napoleon marked this pull request as ready for review November 5, 2025 18:53

Merge branch 'main' into t-digest-field-mapper

39a0917

elasticsearchmachine added the Team:StorageEngine label Nov 5, 2025

integrate upstream changes

837e922

not-napoleon mentioned this pull request Nov 5, 2025

Add support for storing and querying T-Digest sketches #137649

Open

41 tasks

kkrik-es reviewed Nov 6, 2025

View reviewed changes

...nalytics/src/test/java/org/elasticsearch/xpack/analytics/mapper/TDigestFieldMapperTests.java Outdated Show resolved Hide resolved

kkrik-es reviewed Nov 6, 2025

View reviewed changes

kkrik-es approved these changes Nov 6, 2025

View reviewed changes

JonasKunz approved these changes Nov 6, 2025

View reviewed changes

not-napoleon and others added 4 commits November 6, 2025 09:19

include value in error message

0c87d60

typos and renaming

5d521a0

values -> centroids

dc5e5f4

Co-authored-by: Kostas Krikellas <[email protected]>

Merge remote-tracking branch 'refs/remotes/not-napoleon/t-digest-fiel…

ff3f54b

…d-mapper' into t-digest-field-mapper

not-napoleon enabled auto-merge (squash) November 6, 2025 14:37

not-napoleon added 2 commits November 6, 2025 09:51

fix test

fbe64a9

Merge branch 'main' into t-digest-field-mapper

8e76fa2

not-napoleon merged commit d78c7e6 into elastic:main Nov 6, 2025
34 checks passed

bartoval mentioned this pull request Nov 13, 2025

[ES|QL] Support new exponential_histogram ES Field type elastic/kibana#242748

Merged

	* The parse is expected to point at the next token after {@link XContentParser.Token#START_OBJECT}.
	* The parser is expected to point at the next token after {@link XContentParser.Token#START_OBJECT}.

	private static final ParseField VALUES_FIELD = new ParseField("centroids");
	private static final ParseField CENTROIDS_FIELD = new ParseField("centroids");

T digest field mapper #137546

T digest field mapper #137546

Uh oh!

Conversation

not-napoleon commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Nov 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kkrik-es left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

not-napoleon commented Nov 3, 2025 •

edited

Loading