Skip to content

fix: call patch_all before importing handler code. #598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

purple4reina
Copy link
Contributor

What does this PR do?

Move location of where we call ddtrace.patch_all, ensuring it is always called before the handler is imported.

Motivation

Customer reported issue (see https://datadoghq.atlassian.net/browse/SLES-2262) of not seeing spans or distributed tracing from their confluent_kafka calls. Here is a highly simplified version of their lambda handler:

from confluent_kafka import Producer

producer = Producer({'bootstrap.servers': 'mybroker1,mybroker2'})

def handle(event, context):
    producer.produce('mytopic', 'hello world!')
    producer.flush()

datadog-lambda calls ddtrace.patch_all() after the customer handler is imported. To see this look at handler.py and wrapper.py. Here, patch_all() is currently called when initializing the DatadogWrapper in wrapper.py. This only happens after all customer code is imported in handler.py. To demonstrate, here's a commented version of a abridged version of our handler.py file:

from importlib import import_module

import os
from time import time_ns

from datadog_lambda.tracing import emit_telemetry_on_exception_outside_of_handler
from datadog_lambda.wrapper import datadog_lambda_wrapper			# <--- wrapper imported
from datadog_lambda.module_name import modify_module_name

... other stuff ...

try:
    handler_load_start_time_ns = time_ns()
    handler_module = import_module(modified_mod_name)				# <--- handler cold imported
    handler_func = getattr(handler_module, handler_name)
except Exception as e:
    emit_telemetry_on_exception_outside_of_handler(
        e,
        modified_mod_name,
        handler_load_start_time_ns,
    )
    raise

handler = datadog_lambda_wrapper(handler_func)						# <--- patch_all called

The calling of patch_all() after their handler code is imported is causing the producer to not get any instrumentation applied. We can see this by inspecting the producer's type.

print(producer.__class__.__name__)  # prints "Producer" but should be "TracedProducer"

💭 So wait a minute 💭, why is this only a problem now? This call to patch_all() was added over 5 years ago, why has no one reported this until now!?

This has to do with the nature of how ddtrace does it's patching. It's individual to each contrib module patched and how the customer uses it.

Interestingly, if you were to inspect the producer type in a different way, we see a different result:

import confluent_kafka
from confluent_kafka import Producer

print(confluent_kafka.Producer.__name__)  # prints "TracedProducer"
print(Producer.__name__)                  # prints "Producer"

Why is this? Because confluent_kafka.Producer accesses the producer by reference whereas Producer has initialized and saved the producer as the non-traced class.

Testing Guidelines

Additional Notes

⚠️ Important note ⚠️ This PR has the added consequence that customers will now see spans created by patched module calls made on the global level (ie on cold start). Previously these spans were not created at all, because patch_all() hadn't been called until after the handler code was fully imported. For example, this code will now produce a span for the requests http call made on the global level during cold start.

import requests

resp = requests.get('https://example.com')
print(resp.status_code)

def handler(event, context):
    pass

The only problem is (and here's the ⚠️ warning) that these newly created spans will always be orphaned. This is because of the way in which we manage cold start tracing. During cold start we are unable to determine the trace id because we have not yet started the root trace span nor have we been able to receive any inbound distributed tracing headers.

It should be possible to correctly parent these new orphaned spans. However, that is outside the scope of this PR because it will be difficult and significant undertaking. We can cross that bridge when we get there.

Types of Changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)

@purple4reina purple4reina requested a review from a team as a code owner May 14, 2025 19:17
@@ -45,6 +45,10 @@
extract_http_status_code_tag,
)

# Patch third-party libraries for tracing, must be done before importing any
# handler code.
patch_all()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are planning to deprecate ddtrace.patch_all(...) and instead encourage folks to use import ddtrace.auto (which executes a similar set of operations as ddtrace-run).

I don't think we can make this change in this PR due to the overhead of loading all ddtrace products and entrypoints but it's something we should investigate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Munir. It's definitely something on our radar. Last I benchmarked the difference between patch_all and ddtrace.auto, it was significant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants