Skip to content

Add SDK span processors spec #205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Sep 20, 2019
80 changes: 80 additions & 0 deletions specification/sdk-span-processors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Span processor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to register a Span processor?
Do we allow multiple processors? (is there any guarantee on the ordering).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could guarantee the order. I have added this:

Span processors can be registered via a method on SDK Tracer. The registered processors are invoked in the same order as they were registered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pavolloffay for answering the 2nd question.
Regarding the 1st question "How to register a Span processor", do you intend to cover it in a separate PR for the Tracer API change? Do we expect both register and unregister (or only allow to register during the Tracer initialization/ctor)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I implemented this with only Register and can be done at any moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already documented the registration step. Here is the copy but please comment on the code if something should be fixed.

Span processors can be registered directly on SDK Tracer. The processors are invoked in the same order as they were registered.


Span processor is an interface which allows hooks for span start and end method invocations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why we choose the word "Processor" as opposed to "Exporter"? In the diagram a processor is used to build an exporter, so it seems like a fine distinction to make.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an answer:

(a) if I must make a copy of the data, it's a Processor
(b) if I can take a reference to the data, it's an Exporter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I posted in the diagram #205 (comment) the SDK exposes just an processor for different Span events (start/end).

We build an "export" pipeline by implementing a SpanProcessor (Batch/Simple) that will redirect a List<SpanData(Proto)> to an actual "exporter" or other intermediate steps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an answer:

(a) if I must make a copy of the data, it's a Processor
(b) if I can take a reference to the data, it's an Exporter

In my understanding, it's rather the other way round: A Span-processor gets a mutable reference to a single Span synchronously. A exporter on the other hand may get one or more read-only Spans (or some data transfer object to which the Span was converted) delayed, asynchronously or never (queue full).

The span processors are invoked only when [`IsRecordingEvents`](api-tracing.md#isrecordingevents) is true. This interface can be used to implement [span exporter](sdk-exporter.md) to batch and convert spans.

Span processors can be registered directly on SDK Tracer and they are invoked in the same order as they were registered.

The following diagram shows `SpanProcessor`'s relationship to other components in the SDK:

```
+-----+---------------+ +---------------------+ +-------------------+
| | | | | | |
| | | | BatchProcessor | | SpanExporter |
| | +---> SimpleProcessor +---> (JaegerExporter) |
| SDK | SpanProcessor | | | | |
| | | +---------------------+ +-------------------+
| | |
| | | +---------------------+
| | | | |
| | +---> ZPagesProcessor |
| | | | |
+-----+---------------+ +---------------------+
```

## Interface definition

### OnStart(Span)

`OnStart` is called when a span is started.
Copy link
Contributor

@alban alban Aug 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pondering about naming it OnCreate instead of OnStart.

The spec has a section named Span Creation and not "Span Start", though it does not mandate the method name.

In the Java implementation, start vs create does not seem to create ambiguity. But in Python, we have both tracer.create_span(name) and tracer.start_span(name).

I believe the intent of OnStart is to be called regardless of whether the span is created with create_span or with start_span so I think the naming OnCreate would be clearer.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having OnCreate would mean differentiating between a non-started and a started Span, which would create more (unnecessary) issues IHMO. Probably the Specification needs to be updated to make clear that a Span is created and started.

Btw, as far as I know, the OTel Python (and OC Python) implementation is the only one with such a create() method besides a start_span(). Not sure we really need that there ;)

This method is called synchronously on the thread that started the span, therefore it should not block or throw exceptions.

**Parameters:**

* `Span` - a readable span object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather like the Span processor to get a mutable reference to the same span object that is started/ended. I imagine a common use-case for span processors would be to attach context information like thread/coroutine ID, etc.
Although it is true that the Processor (at least in OnEnd) should also be able to read the data, which may or may not be possible with a sdk.Span object. Maybe we must specify an additional Span interface with getters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we must specify an additional Span interface with getters?

The SpanData could be considered as a readable interface, however it is going to be removed #215

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we eliminated SpanData from the user-facing API. I could believe it's still meant as a part of the SDK processor API.


**Returns:** `Void`

### OnEnd(Span)

`OnEnd` is called when a span is ended.
This method is called synchronously on the execution thread, therefore it should not block or throw an exception.

**Parameters:**

* `Span` - a readable span object.

**Returns:** `Void`

### Shutdown()

Shuts down the processor. Called when SDK is shut down. This is an opportunity for processor to do any cleanup required.

Shutdown should be called only once for each `Processor` instance. After the call to shutdown subsequent calls to `onStart` or `onEnd` are not allowed.

Shutdown should not block indefinitely. Language library authors can decide if they want to make the shutdown timeout to be configurable.

## Built-in span processors

### Simple processor

The implementation of `SpanProcessor` that passes ended span directly to the configured `SpanExporter`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the relationship between processors and exporters? It seems like we're using processors to (1) expose span start and end hooks to vendors and (2) batch spans to send to the exporter. Why use the same component for both?

If processors can't modify the spans that they pass to the exporters, why not call the processors and exporters in parallel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If processors can't modify the spans that they pass to the exporters, why not call the processors and exporters in parallel?

From exporter spec:

Allow implementing helpers as composable components that use the same chainable Exporter interface. SDK authors are encouraged to implement common functionality such as queuing, batching, tagging, etc. as helpers. This functionality will be applicable regardless of what protocol exporter is used.

The processor can be considered as a helper for the exporter. Perhaps we should document this in the first section. The exporter should only export and do not batch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind the model was this:

SDK -> SpanProcessor (an implementation that batches all Spans) -> SpanExporter(List<SpanData/SpanProto>).

SpanExporter is one of the functionality that we can implement with the SpanProcessor, but for example some implementation may implement a SpanProcessor that adds extra attributes onStart.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdandrutu Maybe it would be worth adding a note here, to mention that (custom) SpanProcessors can actually modify the received Span?


**Configurable parameters:**

* `exporter` - the exporter where the spans are pushed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to register multiple exporters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in Java there is MultiSpanExporter exporter implementation which exports data to multiple exporters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdandrutu shall we add MultiSpanExporter to the spec? I think it's an implementation detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's not be too restrictive about the implementation.


### Batching processor

The implementation of the `SpanProcessor` that batches ended spans and pushes them to the configured `SpanExporter`.

First the spans are added to a synchronized queue, then exported to the exporter pipeline in batches.
The implementation is responsible for managing the span queue and sending batches of spans to the exporters.
This processor can cause high contention in a very high traffic service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to understand more about high contention. Is this because of the queue insertion operation? There are lock-free queue implementations which doesn't have high contention even under heavy traffic.

Another question - do we expect the queue to guarantee FIFO?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even a lock-free queue can cause false sharing which is also very bad and hard to quantify.

batching does not require to guarantee fifo, and also can drop spans when high traffic (based on a config). For an example of this see https://github.com/open-telemetry/opentelemetry-java/blob/master/sdk/src/main/java/io/opentelemetry/sdk/trace/export/BatchSampledSpansProcessor.java

We also execute the export on a different thread.


**Configurable parameters:**

* `exporter` - the exporter where the spans are pushed.
* `maxQueueSize` - the maximum queue size. After the size is reached spans are dropped. The default value is `2048`.
* `scheduledDelayMillis` - the delay interval in milliseconds between two consecutive exports. The default value is `5000`.
* `maxExportBatchSize` - the maximum batch size of every export. It must be smaller or equal to `maxQueueSize`. The default value is `512`.