Skip to content

Missing tracing information after upgrading to Spring Boot 3 with Micrometer tracing #31993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jhengy opened this issue Jan 10, 2024 · 3 comments
Assignees
Labels
for: external-project Needs a fix in external project status: invalid An issue that we don't feel is valid

Comments

@jhengy
Copy link

jhengy commented Jan 10, 2024

Context

After upgrading to Spring Boot 3, we had to switch our tracing library from spring cloud sleuth to micrometer tracing (referring to the migration guide https://github.com/micrometer-metrics/tracing/wiki/Spring-Cloud-Sleuth-3.1-Migration-Guide).

Expected Behavior

We expect that the migration from sleuth to micrometer should be smooth, providing good backward compatibility. In particular, as the project uses reactive programming extensively, tracing information such as traceId should be preserved in SL4J logs along the reactive chain.

Actual Behavior

Despite already calling Hooks.enableAutomaticContextPropagation(), reactive chain involving reactive web client fails to preserve tracing information such as traceId in SLF4J logs when the chain is invoked using .subscribe. The migration guide clearly suggests that calling Hooks.enableAutomaticContextPropagation() is enough.

Reproductor

I have prepared a reproducer (spring boot test) to demonstrate the potential bug. Please refer to https://github.com/jhengy/my-simple-springboot-app/tree/demo-context-propagation

Follow up questions

  • To solve the issue, it seems that we have to add .contextCapture() before .subscribe explicitly in all the reactive chains. This seems too repetitive and it was expected that migration from sleuth to micrometer should involve minimall changes to the business code. Also, there seems to be a limitation to using .contextCapture(): it only captures context for logs in the upstream, but not downstream. Is there a better and more convenient way (perhaps some kind of global setting) go about doing this?
  • It was mentioned in the blog that there is a plan for Spring Boot to configure context propagation out of the box. May we know how soon it can be?
  • What would be the performance impact(if any) in using Hooks.enableAutomaticContextPropagation() with .contextCapture()?

Thank you.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Jan 10, 2024
@bclozel bclozel self-assigned this Jan 10, 2024
@bclozel
Copy link
Member

bclozel commented Jan 10, 2024

Thanks for reaching out with a detailed sample.

We expect that the migration from sleuth to micrometer should be smooth, providing good backward compatibility. In particular, as the project uses reactive programming extensively, tracing information such as traceId should be preserved in SL4J logs along the reactive chain.

We've certainly tried to make the upgrade as easy as possible. The intent was also to solve all the underlying issues with Sleuth's approach: overhead, over-instrumentation, etc.

Despite already calling Hooks.enableAutomaticContextPropagation(), reactive chain involving reactive web client fails to preserve tracing information such as traceId in SLF4J logs when the chain is invoked using .subscribe. The migration guide clearly suggests that calling Hooks.enableAutomaticContextPropagation() is enough.

I don't think this is as simple as that. Maybe the guide can be improved as a result of our conversation here.

Calling subscribe from within a reactive chain means that it decouples completely both reactive streams. In your sample's case, the WebClient will be performing its task completely independently of the controller method where it's been called. If an error occurs with the client, it won't be handled with the server response. In a non-reactive world, this is equivalent to starting a new Thread and giving it a Runnable. By default, new Thread instances don't inherit thread locals so I guess this behavior aligns here. In general, we don't advise calling subscribe manually for this reason.

When automatic context propagation is enabled, Project Reactor automatically captures the context for blocking operators:

In the automatic mode, blocking operators, such as Flux#blockFirst(), Flux#blockLast(), Flux#toIterable(), Mono#block(), Mono#blockOptional(), and relevant overloads, all perform contextCapture() transparently, so in most cases it is not necessary to add it.

I think this makes sense because unlike subscribe, blocking means that the result and errors are directly returned where it's called. To answer your question, I don't know if there's a more convenient way to do this and whether this would be a good thing. Maybe create a reactor issue for this?

It was mentioned in the blog that there is a plan for Spring Boot to configure context propagation out of the box. May we know how soon it can be?

This is already done (see spring-projects/spring-boot/issues/34201). You can opt in with the following configuration property in your sample application:

spring:
  reactor:
    context-propagation: auto

Note that the setup in the sample application is not ideal, using a @PostConstruct annotation to enable this feature. As mentioned in the Reactor docs, this must be done as soon as possible because it only applies for new subscribers. Instead, if you choose not to use the configuration property, you should call the Hook in the main method right before SpringApplication.run.

What would be the performance impact(if any) in using Hooks.enableAutomaticContextPropagation() with .contextCapture()?

I'm not sure I understand. If you enable automatic propagation it will be taken care of for you in most cases. If you wonder about capturing the context for those subscribe calls, I don't think there would be duplicated efforts there. As for performance impact in general, Dariusz shared his findings here.

I don't think there is anything actionable in Spring Framework right now, so I'd suggest getting in touch with the Micrometer team if you think the migration guide could use some updates. You can also reach out to the Reactor team about the block vs. subscribe situation we've discussed here.

Thanks!

@bclozel bclozel closed this as not planned Won't fix, can't repro, duplicate, stale Jan 10, 2024
@bclozel bclozel added status: invalid An issue that we don't feel is valid for: external-project Needs a fix in external project and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Jan 10, 2024
@jhengy
Copy link
Author

jhengy commented Jan 11, 2024

@bclozel thanks for the detailed reply, will follow up with the respective teams

@chemicL
Copy link
Member

chemicL commented Jan 11, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: external-project Needs a fix in external project status: invalid An issue that we don't feel is valid
Projects
None yet
Development

No branches or pull requests

4 participants