- 
                Notifications
    
You must be signed in to change notification settings  - Fork 945
 
feat(sampler-composite): add experimental implementation of composite sampling spec #5839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
          Codecov Report❌ Patch coverage is  Additional details and impacted files@@            Coverage Diff             @@
##             main    #5839      +/-   ##
==========================================
- Coverage   95.07%   95.02%   -0.05%     
==========================================
  Files         308      315       +7     
  Lines        8037     8218     +181     
  Branches     1626     1665      +39     
==========================================
+ Hits         7641     7809     +168     
- Misses        396      409      +13     
 🚀 New features to boost your workflow:
  | 
    
| 
           Note to self: Not to be confused with the other, earlier, but also still   | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super early feedback. I've only read OTEPs/specs and then briefly started looking at this PR, so I've raised annoying "name" questions to start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[This is a general question. I'm commenting here for lack of a good place to mention it.]
My head is spinning from the inconsistent (har har) use of Composite or Composable or Consistent usage in class and interface names in the spec, OTEP, and implementations.  And also, a little less so, in some class name changes, e.g. ConsistentFixedThreshold in the OTEP vs. ComposableTraceIDRatioBased in the spec.  I haven't read through this PR yet, but I'm guessing other API diffs between OTEP and spec will show up, e.g. threshold_reliable in the spec vs. IsAdjustedCountReliable in the OTEP.
While I don't know that the names used in the spec need to be the same names as in the implementation APIs, approaching the same names is likely desired.  Are we implementing the spec here, or the Java consistent56 implementation, which seems to follow the OTEP naming (which makes sense as it was probably developed before the spec renamings)?
I'm happy to go asking about this in the OTel channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - I will read the spec and align with it as much as possible, sorry for missing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what Java (and the Rust/Go implementations that are linked to from open-telemetry/opentelemetry-specification#4466) will do. I.e. will they change their implementations to use the class names in the spec? Or change the spec to use Consistent-prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the rust and go implementations (esp. the go implementation README: https://github.com/jmacd/go-sampler) I'm pretty certain CompositeSampler and Composable*Sampler naming is the intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked for clarification in #otel-sampling: https://cloud-native.slack.com/archives/C027DS6GZD3/p1755299016130699
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue/answer on the naming: open-telemetry/opentelemetry-specification#4640
Basically: "So the short answer is that the Specification names should be used.", but "do not strongly believe that all SDKs need the same exact names".
So, whatever names are used by a particular SDK is fine.
(Note that I would expect the OTel schema for its configuration data model would use the spec names for these samplers, when these new samplers are added to the schema. At that point there is some value to users and maintainers if the configuration names match the implementation names.)
| ConsistentAlwaysOnSampler, | ||
| ConsistentFixedThresholdSampler, | ||
| ConsistentParentBasedSampler, | ||
| } from '@opentelemetry/otlp-transformer'; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copypasta on the package name
…y-js into consistent-sampling
| 
           Thanks for the help @trentm - I cross checked with the SDK spec and think I have aligned the names with it. One UX difference is now   | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing, but I'm finishing for the day, so I'll post the few comments I have so far. I'm still reading through the implementation.
| "src/generated/*.js", | ||
| "src/generated/*.ts" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "src/generated/*.js", | |
| "src/generated/*.ts" | 
copypasta?
| const probabilityThresholdScale = Math.pow(2, 56); | ||
| 
               | 
          ||
| function calculateThreshold(samplingProbability: number): bigint { | ||
| return ( | ||
| MAX_THRESHOLD - | ||
| BigInt(Math.round(samplingProbability * probabilityThresholdScale)) | ||
| ); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit that I find the edge cases beyond Number.MAX_SAFE_INTEGER to be scary. However, I think this calculates the correct values.
For example:
> MAX_THRESHOLD
72057594037927936n
> probabilityThresholdScale   #  "bigger" than MAX_THRESHOLD
72057594037927940
# So you'd think that calculateThreshold might possible return a negative.
# However, nope because hooray for 64-bit floating point precision beyond Number.MAX_SAFE_INTEGER:
> BigInt(72057594037927940)
72057594037927936n
| 
           Playing with the impl a little bit, it does the thing. I'm running: with: http-server.js// Usage:
//  node --import ./telemetry.mjs http-server.js
//  curl http://127.0.0.1:3000/ping
//  hey http://127.0.0.1:3000/ping
const http = require('http');
const api = require('@opentelemetry/api');
const { TraceFlags } = require('@opentelemetry/api');
let numReqs = 0
let numSampled = 0;
const server = http.createServer(function onRequest(req, res) {
    var ctx = api.trace.getSpanContext(api.context.active());
    const sampled = ctx.traceFlags & TraceFlags.SAMPLED;
    numReqs++;
    if (sampled) numSampled++;
    console.log('incoming request: %s %s (sampled=%s, rate=%s, tracestate="%s")',
      req.method, req.url, sampled, (numSampled/numReqs).toFixed(3),
      ctx.traceState?.serialize() || "");
    req.resume();
    req.on('end', function () {
        const body = 'pong';
        res.writeHead(200, {
            'content-type': 'text/plain',
            'content-length': Buffer.byteLength(body),
        });
        res.end(body);
    });
});
server.listen(3000, '127.0.0.1', function () {
    console.log('listening at', server.address());
});telemetry.mjsimport os from 'os';
import path from 'path';
import {NodeSDK} from '@opentelemetry/sdk-node';
import {
    HttpInstrumentation,
} from '@opentelemetry/instrumentation-http';
import {
  CompositeSampler,
  ComposableAlwaysOffSampler,
  ComposableAlwaysOnSampler,
  ComposableTraceIDRatioBasedSampler
} from '@opentelemetry/sampler-composite';
var sampler;
// sampler = new CompositeSampler(new ComposableAlwaysOffSampler());
// sampler = new CompositeSampler(new ComposableAlwaysOnSampler());
sampler = new CompositeSampler(new ComposableTraceIDRatioBasedSampler(0.4));
const sdk = new NodeSDK({
    serviceName: path.parse(process.argv[1]).name,
    instrumentations: [
        new HttpInstrumentation()
    ],
    sampler,
});
process.on('SIGTERM', async () => {
    try {
        await sdk.shutdown();
    } catch (err) {
        console.warn('warning: error shutting down OTel SDK', err);
    }
    process.exit(128 + os.constants.signals.SIGTERM);
});
process.once('beforeExit', async () => {
    // Flush recent telemetry data if about to shutdown.
    try {
        await sdk.shutdown();
    } catch (err) {
        console.warn('warning: error shutting down OTel SDK', err);
    }
});
sdk.start();You can hit it with 100s of request with  I haven't played with   | 
    
| 
           @anuraaga I think you need these changes so the new package isn't reaching out of its package dir to a relative dir in the monorepo: diff --git a/experimental/packages/sampler-composite/src/composite.ts b/experimental/packages/sampler-composite/src/composite.ts
index 66025c9d7..737329a77 100644
--- a/experimental/packages/sampler-composite/src/composite.ts
+++ b/experimental/packages/sampler-composite/src/composite.ts
@@ -19,6 +19,7 @@ import {
   Attributes,
   Link,
   TraceState,
+  trace,
 } from '@opentelemetry/api';
 import { TraceState as CoreTraceState } from '@opentelemetry/core';
 import {
@@ -27,7 +28,6 @@ import {
   SamplingResult,
 } from '@opentelemetry/sdk-trace-base';
 import { ComposableSampler } from './types';
-import { getSpanContext } from '../../../../api/src/trace/context-utils';
 import { parseOtelTraceState, serializeTraceState } from './tracestate';
 import {
   INVALID_THRESHOLD,
@@ -46,7 +46,7 @@ export class CompositeSampler implements Sampler {
     attributes: Attributes,
     links: Link[]
   ): SamplingResult {
-    const spanContext = getSpanContext(context);
+    const spanContext = trace.getSpanContext(context);
     const traceState = spanContext?.traceState;
     const otTraceState = parseOtelTraceState(traceState);
diff --git a/experimental/packages/sampler-composite/src/parentthreshold.ts b/experimental/packages/sampler-composite/src/parentthreshold.ts
index 55ea81d5f..1a63b6fd1 100644
--- a/experimental/packages/sampler-composite/src/parentthreshold.ts
+++ b/experimental/packages/sampler-composite/src/parentthreshold.ts
@@ -21,8 +21,8 @@ import {
   Link,
   SpanKind,
   TraceFlags,
+  trace,
 } from '@opentelemetry/api';
-import { getSpanContext } from '../../../../api/src/trace/context-utils';
 import { ComposableSampler, SamplingIntent } from './types';
 import { parseOtelTraceState } from './tracestate';
 import { INVALID_THRESHOLD, isValidThreshold, MIN_THRESHOLD } from './util';
@@ -42,7 +42,7 @@ export class ComposableParentThresholdSampler implements ComposableSampler {
     attributes: Attributes,
     links: Link[]
   ): SamplingIntent {
-    const parentSpanContext = getSpanContext(context);
+    const parentSpanContext = trace.getSpanContext(context);
     if (!parentSpanContext || !isSpanContextValid(parentSpanContext)) {
       return this.rootSampler.getSamplingIntent(
         context, | 
    
| 
           Thanks for the help with the cleanups, I think I got them all, and switched to factory functions.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better with factory functions.
// Before, the types included:
export declare class ComposableTraceIDRatioBasedSampler implements ComposableSampler {
    private readonly intent;
    private readonly description;
    constructor(ratio: number);
    getSamplingIntent(): SamplingIntent;
    toString(): string;
}
// After, it is:
export declare function composable_trace_id_ratio_based_sampler(ratio: number): ComposableSampler;| /** Any attributes to add to the span for the sampling result. */ | ||
| attributes?: Attributes; | ||
| 
               | 
          ||
| /** How to update the TraceState for the span. */ | ||
| updateTraceState?: (ts: TraceState | undefined) => TraceState | undefined; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the spec uses attributes_provider (a function that takes no args) and trace_state_provider for these two properties.  The current rust PoC implementation only has the former: https://github.com/jmacd/rust-sampler/blob/025e56cb968536fbed7ed8411887e68df5450a3b/src/lib.rs#L112-L116
The current Go PoC impl uses Attributes and TraceState (both are functions): https://github.com/jmacd/go-sampler/blob/52326351ef34450ca3efa7f8755964cd0ddb0d48/sampler.go#L110-L123
So... I'm not sure. I didn't see the discussion for any naming changes from OTEP + PoCs to the final spec (there wasn't any on the spec PR), but I gather we should follow the spec naming here? I'm feeling like a pedant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing it up - I agree with wanting to keep closer to the spec naming and thought about this for some time. But I went with leaving it
- I can't see any reason for attributes to be a function. If not a function, provider seems like a worse name
 - Trace state being a function, provider seems still ok. But it still just felt like a net negative in readability
 
So leaning to making an editorial decision here. But if you still think it's better to match up one or both, happy to change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"editorial decision" sounds good to me. I'll leave this discussion open for now in case a second reviewed wants to weigh in.
| 
           FWIW, here is my current play code telemetry.mjs// @ts-check
import * as os from 'os';
import * as path from 'path';
import {NodeSDK} from '@opentelemetry/sdk-node';
import {
  HttpInstrumentation,
} from '@opentelemetry/instrumentation-http';
import {
  composite_sampler,
  composable_always_on_sampler,
  composable_always_off_sampler,
  composable_parent_threshold_sampler,
  composable_trace_id_ratio_based_sampler,
} from '@opentelemetry/sampler-composite';
const samplerOff = composite_sampler(composable_always_off_sampler());
const samplerOn = composite_sampler(composable_always_on_sampler());
const samplerRatio40p = composite_sampler(composable_trace_id_ratio_based_sampler(0.4));
const samplerRatioInvalid = composite_sampler(composable_trace_id_ratio_based_sampler(0));
const samplerParent = composite_sampler(composable_parent_threshold_sampler(composable_trace_id_ratio_based_sampler(0.5)));
const sdk = new NodeSDK({
  serviceName: path.parse(process.argv[1]).name,
  instrumentations: [
    new HttpInstrumentation()
  ],
  sampler: samplerParent,
});
sdk.start();
// @ts-ignore Ignore this usage of private attributes.
console.log('Started SDK with this sampler: %s', sdk?._tracerProvider?._config?.sampler);
let tryShutdown = async () => {
  try {
    await sdk.shutdown();
  } catch (err) {
    console.warn('warning: error shutting down OTel SDK', err);
  }
}
process.on('SIGTERM', async () => {
  await tryShutdown();
  process.exit(128 + os.constants.signals.SIGTERM);
});
process.once('beforeExit', tryShutdown);http-server.jsRun those using: Then some example calls to see if one gets the expected sampling:  | 
    
| export const INVALID_TRACE_STATE: OtelTraceState = { | ||
| randomValue: INVALID_RANDOM_VALUE, | ||
| threshold: INVALID_THRESHOLD, | ||
| }; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| export const INVALID_TRACE_STATE: OtelTraceState = { | |
| randomValue: INVALID_RANDOM_VALUE, | |
| threshold: INVALID_THRESHOLD, | |
| }; | |
| export const INVALID_TRACE_STATE: OtelTraceState = Object.freeze({ | |
| randomValue: INVALID_RANDOM_VALUE, | |
| threshold: INVALID_THRESHOLD, | |
| }); | 
IIUC, using Object.freeze here should help prevent accidental mutation of this re-used object.
There is one current usage in opentelemetry-js.git:
Lines 115 to 117 in f2b0d2a
| const DEFAULT_AGGREGATION = Object.freeze({ | |
| type: AggregationType.DEFAULT, | |
| }); | 
I only went looking because I don't have a lot of personal experience using Object.freeze.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo:
- package-lock conflict to resolve
 - I have one optional suggestion for using Object.freeze
 - I'll bring this up in the OTel JS SIG tomorrow. It would be good to get another reviewer on this.
 
…y-js into consistent-sampling
…ntelemetry-js into consistent-sampling
| 
               | 
          ||
| /** Returns the sampler name or short description with the configuration. */ | ||
| toString(): string; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CompositeSampler is a decorator that implements the standard Sampler interface but uses a composition of samplers to make its decisions.
Any reason not to explicitly implement Sampler here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean having the composable samplers usable as samplers without wrapping. I considered it but found the types to be confusing, notably it introduces a circular path from composable sampler to composite sampling logic back to composable sampler. Instead, it should be still easy but clearer for SDKs where possible (here it is) to accept either Sampler or ComposableSampler in methods like setSampler with overloads or such. Then the usage should be basically the same while the sampler implementations can still be kept well defined. Where it's not possible (maybe Go), working around it with types could be an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, my read of the design was that there was no intention that the ComposableSamplers implement interface Sampler. Rather they should always be wrapped in a CompositeSampler for use in the rest of OTel.
Unless I'm missing it, the go (https://github.com/jmacd/go-sampler/blob/main/sampler.go) and rust (https://github.com/jmacd/rust-sampler/blob/main/src/lib.rs) implementations that jmacd wrote when he was writing OTEP 250 do not have the ComposableSamplers implementing interface Sampler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, what @trentm says. The composable samplers do not directly implement Sampler, there is a CompositeSampler which takes a Composable.
…y-js into consistent-sampling
| function calculateThreshold(samplingProbability: number): bigint { | ||
| return ( | ||
| MAX_THRESHOLD - | ||
| BigInt(Math.round(samplingProbability * probabilityThresholdScale)) | ||
| ); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend checking that the probability is in-range, at least. (Maybe that's done before calling this, ok.)
This document https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/#converting-floating-point-probability-to-threshold-value shows a few ways to compute threshold from number with variable precision. This is because for most numbers you will unnecessarily encode 14 bytes of very-precise sampling probability when 4 bytes is fairly precise and saves 10 bytes per Context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's checked.
I have left a TODO for precision since it's a SHOULD optimization and will follow up on it - currently the behavior matches the java-contrib one while cleaning up the names towards the spec text. It means the unit tests fully line up with it giving relative confidence in the implementation - it'll be good to make changes that affect the behavior and tests separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anuraaga Just a few final updates below, then I'll merge this.
| "publishConfig": { | ||
| "access": "public" | ||
| }, | ||
| "version": "0.203.0", | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "version": "0.203.0", | |
| "version": "0.205.0", | 
        
          
                package-lock.json
              
                Outdated
          
        
      | }, | ||
| "experimental/packages/sampler-composite": { | ||
| "name": "@opentelemetry/sampler-composite", | ||
| "version": "0.203.0", | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "version": "0.203.0", | |
| "version": "0.205.0", | 
        
          
                CHANGELOG.md
              
                Outdated
          
        
      | * feat(sampler-composite): Added experimental implementations of draft composite sampling spec [#5839](https://github.com/open-telemetry/opentelemetry-js/pull/5839) @anuraaga | ||
| 
               | 
          
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this entry to "experimental/CHANGELOG.md" -- the changelog file for the packages under experimental/...
…y-js into consistent-sampling
| 
           Thanks @trentm - looks green  | 
    
| 
           Sigh. The internet is dumb. Lint is failing now in: Those links work fine in a browser. But run them outside a browser (as the  WTF.   I passed in headers to match what I think is the exact request that Firefox sent. Yet Firefox (via the network panel) gets a 200 response and   | 
    
| 
           These are the only two links to npmjs.com in the docs being link-checked.  | 
    
| 
           I've opened #5948 to attempt to deal with the unrelated lint failure.  | 
    
… sampling spec (open-telemetry#5839) Co-authored-by: Trent Mick <[email protected]>
…omposite sampling spec (open-telemetry#5839)" This reverts commit a662970.
Which problem is this PR solving?
Implements the draft OTel spec for composite sampling, to allow downstream to know and preserve a root sampling decision
https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/
https://opentelemetry.io/docs/specs/otel/trace/sdk/#built-in-composablesamplers
This is based on the proposed Python implementation
open-telemetry/opentelemetry-python#4714
Which in turn is based on the Java one
https://github.com/open-telemetry/opentelemetry-java-contrib/tree/main/consistent-sampling/src/main/java/io/opentelemetry/contrib/sampler/consistent56
This is closer to the Python one though being a script-type language.
/cc @trentm to help with review
Short description of the changes
Adds a experimental package containing consistent sampler implementations
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Checklist: