-
Notifications
You must be signed in to change notification settings - Fork 1
Test flexibility of TracingInstrument by creating an XRay tracer #72
Comments
Cool, let me know if the interface I created is convenient to use and if if you have any comments/issues in general. While you are more than welcome to fork it and poke around, IMO rather than making I should probably be as simple as creating a other than different wording there are is also distinction between "annotation" (indexed) and "metadata" - not sure how it compares with OT. |
BTW the main missing/incomplete functionality as far as the XRay does support it fairly well but the tracing needs to be "hooked" in somehow. I made a quick hack and implemented XRayMiddleware for aws-sdk-swift but its broken and I am lurking to see how you address it. |
Thanks for chiming in, @pokryfka 👍 I guess we aim for |
as far as I understand, It could work similar to swift-metrics which defines a generic interface for For "libraries" generic On the other hand application developer may choose more concrete tracer if it adds value and he's willing to sacrifice some flexibility. For example, if running a simple Lambda function it may be decided to use, say, DynamoDB rather than creating "database abstraction"; similarly XRay can make a good choice because its provided in the execution runtime. Note that it does work the same way with SwiftPrometheus which provides more functionality than defined by swift-metrics. As far as tracing downstream HTTP web services is concerned, the most natural place is (a) HTTP client. In case of |
@ktoso is there a chance to "instrument" HTTP client rather than having to create in minimal approach, I think it could be as simple as making it would make adoption of |
Huh a lot of questions to get to here :-) Thanks for chiming in @pokryfka! I hope we'll be able to collaborate on figuring out how these libs and APIs should inter-op, your expertise with XRay is invaluable here 👍 I understand what you mean with a compat "shim" -- that is also an option generally speaking. I'm only a bit worried about excessive copying of metadata between the instruments and your specific backend -- thus the existence of (gsoc-)swift-baggage-context which can be used as generic and typesafe container for values, that our instruments carry around, inject/extract. The
Such differences are of great interest to us in general... could you perhaps have a look at BaggageContext as well as Span and think how those are overlapping? Note that we can carry "anything" inside a baggage context, and it would get propagated, even if it is not explicitly in the Span API. Modifying it could be a bit tricker though, but I hope that'd also be possible with extensions on the span type if necessary.
Correct, that is the plan. End-users can of course directly use XRay tracer types and get benefits from it, but libraries like HTTPClients and similar can not special choose a specific tracer to marry themselves with.
Indeed, and not only that specific client but all other ones as well (and even database clients/drivers if they want to) and other binary clients as well. The result of this gsoc (and continued work after the gsoc -- this will take time) are APIs that libraries, such as the AsyncHTTPClient (and many other ones) can (and will, as we're in close collaboration) adopt and instrument their libraries in terms of those generic TracingInstruments. This means, that at least the minimal shared feature-set of tracers that is possible to express using these APIs (which are very much aligned with OpenTelemetry style APIs) should be automatically available when an end-user picks and configures a specific tracer (e.g. XRay), by configuring the TracingSystem with it. So to answer the following directly:
Yes, that is absolutely what's we're going for with this API. AsyncHTTPClient will depend on it directly and we'll be able to use any tracer that adapts these APIs with it - without the need of having to instrument the HTTPClient "again" every single time there's some new tracer that'd like to instrument things. In other words, the goal of this work is to address the "libraries need to instrument their code" pain of distributed tracing. We want to ensure they can do so only once in abstract terms, and other tracers can be plugged into the existing instrumentation points. To achieve this, we need to check this API is "good enough" to serve the common needs of popular tracers, put these APIs through the SSWG process, have it incubate and adopted by as many projects as makes sense 🚀 The other goal is of course to have all existing tracers be able to express as much functionality using those generic APIs, so the libraries instrumented with them provide useful traces. Yes, not everything may be expressible, but it's better to converge on an (open) standard for all libraries, than attempting to instrument each library separately for each slightly different tracer :-). Hope that answers some of the in-line questions. There's definitely going to be some tricky and unsolved problems still so I hope we can collaborate and figure out how to get the best tracing experience to all of Swift, including generic libraries and end-user apps/systems! |
that's awesome! I think there are two choices:
The second one should be easier and less intrusive but as you pointed out would involve more copying data back and forward. I am happy with both as long as any functionality not defined by I will try to check how TracingInstrument compares to XRay segments but probably not sooner than next week.
there are no events as such in XRay (could be added as metadata) Question: As far as XRay is concerned, there is not a lot of context propagation as such.
Note that it does contain trace id, parent id and sampling decision but that's about it.
Given scenario: client A makes a call to server B which makes a few calls to servers C, D, E
in a XRay scenario, tracing header would be passed to B (which on its own could be also instrumented) then (optionally) include info about the HTTP request and response, example:
|
/// An `Instrument` with added functionality for distributed tracing. Is uses the span-based tracing model and is
/// based on the OpenTracing/OpenTelemetry spec.
public protocol TracingInstrument: Instrument {
/// The currently traces `Span`.
var currentSpan: Span? { get }
// ...
}
|
👍 Right yeah, so a library can either directly use the types or offer an adapter/shim. The copying of data has the potential to be a impactful on performance so that's one of the worries and why directly using baggage can be preferable.
Yes that's a goal as well -- users should be able to boostrap with xray and then the httpclient etc will use the generic interface to talk to it, but they totally could use the xray tracer directly -- I suspect we'd want to do this as an extension on
Ok, cool; you could ignore them OR log them if users decided to configure the xray tracer with "log events" or something like that... Events are basically structured logs which are associated with a span -- so basically they carry the trace id in metadata as well. Some systems collect them up and emit those together with the span when it is being recorded, but we can imagine other ways of handling them... Ignoring by default since xray does not do that (it seems?) and allowing some configuration to may be do something with them seems reasonable.
This in reality in other languages this basically means a thread local -- those are pretty unwieldily to control in async indeed unless every single library is aware of the context/span and can carry it around at asynchronous boundaries. That's what we've done with https://developer.lightbend.com/docs/telemetry/current/extensions/opentracing/enabling.html by instrumenting every single async api and make the span be carried through them. In Swift's reality... I don't think this is going to fly at all though, we can't randomly instrument (with bytecode weaving as one does on the JVM) libraries we don't control, and we'd want these tracing APIs also to work with Dispatch and other APIs we can't change. So... why is that currentSpan even there? To be honest because for now we adopted the Open Telemetry suggested API as-is, but the currentSpan may not make much sense for us and maybe we should drop it and rely on explicit context passing always. Long story short:
TLs of course break down completely with anything very async like event loop groups, dispatch groups (where one has to use queue local) and actor model things which also hop threads unpredictably. // Sidenote: There's some ideas how "structured task local" values could look like being experimented with on the JVM's project Loom's Scope Variables which would be MUCH better to solve this and would work well with coroutines... but we don't have those in swift and no idea if it'd ever happen. But in a magical future that'd be a nice way to implement this. |
So the goal of baggage context is that "any tracer can put their values in there" and we just carry it around, without even knowing what's inside there. In that sense, XRay would put the So XRay tracer gets called as: func extract<Carrier, Extractor>(_ carrier: Carrier, into baggage: inout BaggageContext, using extractor: Extractor) {
// carrier is e.g. HTTPHeaders so you extract the header and put it into baggage
} And when we're about to make a request, the HTTPClient would invoke
Note that there can be multiple instruments installed at the same time, so if someone wanted to carry some other information at those points they can as well, and we're invoking all instruments at those inject/extract points. |
I'm not sure I understand the questions here:
That's up to your tracer impl, we only make sure you'll get called with:
I'm not sure if we're using the word instrumented in the same meaning here? To clarify, to me "instrumented" is a static property of a system, and if it's completely not instrumented we lost a trace -- there's nothing that'd extract the headers after all (!). I suspect you mean "how to know if they should record"? Because if they're instrumented instruments will be called and can do the extract/inject stuff, but that will be done always, and based on the extracted information your tracer has to notice "aha! Sampled=0, so this system should NOT record" right? Thanks for the discussion, let's keep it going and iron out all questions and confusions :-) |
quick comparison of XRay segment against OT Span References: Annotations and metadata vs Span Attribute
on top of that XRay segment defines: AWSAWS object/dictionary containing information about AWS runtime environment like Account id, ECS container, EC2 instance Id. These are AWS/implementation specific but perhaps some of OT SpanAttributes, see #65, could be mapped to HTTP request metadataWe should try to map HTTP Span attributes This is vaguely related with SpanKind XRay subsegment HTTP request object may have:
I will follow up on this in separate comment. Errors, faults, and exceptions vs Span status
Record ExceptionPer OT:
This is not currently defined by
Note that XRay Span EventsXRay segment does not have direct LinksXRay segment does not have links (to other segments/spans). It may have
whereas Span link
These should probably be ignored by XRay instrument. |
Thank you for the defining "instrumenting" and "recording" - that's useful. XRay segment defines how to record:
example of JSON document with subsegment for a downstream HTTP call:
Im an not sure if its covered in any of the use cases, it would be great if instrumented HTTP client injected context but also passed back HTTP response (or failure) back to the tracer. For reference this how tracing of HTTP requests is implemented in (one of a few) official AWS XRay SDK for Java: and equivalent in Swift XRay tracer (this very much a proof of a concept at the moment but also much less code so maybe easier to get the idea): https://github.com/pokryfka/aws-xray-sdk-swift/blob/master/Sources/AWSXRayRecorderSDK/Middleware.swift Now, as far as XRay is concerned, the response contains HTTP status and optionally content size:
however given the flexibility and robustness of not sure if/how that could be useful (?) |
FYI pokryfka/aws-xray-sdk-swift#16 This is how I think |
@pokryfka Awesome, thanks for taking the time to create this 😎 I think we can use your PR from now on to discuss XRay specific things / current limitations to our APIs. Very excited about getting the first real-world tracer up and running 😊 |
It seems Amazon folks are implementing Events as metadata, for reference: awslabs/aws-xray-sdk-with-opentelemetry@89f941a |
Thank you the reference, I will check it out later on! Anything
|
Uh oh!
There was an error while loading. Please reload this page.
Because XRay uses different wordings for things like
Span
s it's a good candidate for testing how well ourTracingInstrument
performs in the real world. We plan on forking pokryfka/aws-xray-sdk-swift to poke around a bit by having it depend ongsoc-swift-tracing
.The text was updated successfully, but these errors were encountered: