Skip to content

Metric Tags with Multiple Values when Tag Keys match Function Tags #550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jodylent opened this issue Jan 8, 2025 · 6 comments
Closed

Comments

@jodylent
Copy link

jodylent commented Jan 8, 2025

Context

We have a small number of Lambda Functions whose purpose is to instrument third-party applications, sending metrics to Datadog on their behalf. Some of these applications do not have DD integrations, while others do not support custom metrics.

These Functions are tagged in AWS with their own metadata, e.g. team, business_unit, etc. They are not intended to emit that metadata as part of the metrics they produce -- e.g. a Function instrumenting ServiceX would be expected to produce metrics whose tags correspond only to ServiceX.

In practice, we see that any AWS Tags on a Function show up as tags on timestamped custom metrics emitted via the Datadog Lambda Layer's extension_thread_stats, a ThreadStatsWriter instantiated here.

  • There IS NO DOCUMENTATION for how this Function Tag collection/injection occurs in the serverless agent
  • The DD_EXCLUDE_EC2_TAGS variable does NOT affect the behavior.
  • The specifics of the serverless agent (extension) are unimportant to our use case, as we're emitting timestamped metrics, which follow the codepath for extension_thread_stats linked above. (Details of the emit/flush code paths for which happen to be in Lossiness submitting timestamped custom metrics #514)

Expected Behavior

  • MyFunction has AWS Tags foo=bar and baz=quux
  • MyFunction emits custom metrics via the DataDog Lambda Layer's lambda_metric
  • Function Tags are added to my metric tags (in a documented and configurable manner)
  • Function Tags can be overridden with custom tags
# expected tags: ["foo:bar", "baz:custom_val"]
lambda_metric(
    "metric.key",
    1.0,
    tags=["baz:custom_val"],
    timestamp=int(time.time()),
)

Actual Behavior

  • MyFunction has AWS Tags foo=bar and baz=quux
  • MyFunction emits custom metrics, specifying a tag baz=custom_val as above
  • The emitted metrics will have multiple tag values: baz=custom_val,quux
  • The emitted metric points can be queried via either value of the tag

A detailed example follows in the reproduction below

Steps to Reproduce the Problem

For the sake of clarity, I've sanitized out the business logic, and used a simple test:

  • The Function has an AWS Tag test_tag_key=from_tags
    • Note: AWS Tags are universal to a Function, and cannnot be scoped to specific FunctionVersions)
  • The Function code emits a metric, attempting to override test_tag_key with the value from_code
  • The Function code also tags its metric points with a short form of the execution context ID.
    • This is a terrible idea in production (due to cardinality)
    • It's used to clearly illustrate persistence of the problem across multiple Lambda microVMs
  • The Function was first executed without specifying test_tag_key, to establish that something injects the Function Tag's value

Image

  • Two subsequent runs specified tags=["test_tag_key:from_code", ...]

Image

  • A whitespace change to the Function code produced a new execution context (microvm), demonstrating that the problem persists across cold starts

Image

Function Handler (collapsed)
import os
import time

from datadog_lambda.metric import lambda_metric
from datadog_lambda.wrapper import datadog_lambda_wrapper


@datadog_lambda_wrapper
def main(event, context, *args, **kwargs):
    """
    # Layer provides DD Python Libs
    arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Python310:104

    # Extension provides serverless agent (unused, due to timestamps)
    arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Extension:67
    """
    # 2023/01/01/[$LATEST]45efb027ec0049cda3de89c8837f509c -> take first 6 of the execution context UUID
    short_ctx = os.environ.get("AWS_LAMBDA_LOG_STREAM_NAME", "UNSET").rsplit("]", 1)[-1][0:6]

    tag_key = "test_tag_key"

----
    # First run: do not specify "test_tag_key" -- this establishes "baseline" behavior
    # We expect the DD metrics to have the tags ["test_tag_key:from_tags", "ctx:abc123"]
    metric_tags = [f"ctx:{short_ctx}"]

    # Subsequent runs: override "test_tag_key" -- expected tags ["test_tag_key:from_code", "ctx:abc123"]
    metric_tags = [f"ctx:{short_ctx}",f"{tag_key}:from_code"]
----

    lambda_metric(
        "metric.key",
        1.0,
        tags=metric_tags,
        timestamp=int(time.time()),
    )
    return short_ctx

Specifications

  • Datadog Lambda Layer version: v104 (arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Python310:104)
  • Datadog Extension version (unused, due to timestamps): v67 (arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Extension:67)
  • Python version: Discovered on Python3.10

Stacktrace

If only.

@purple4reina
Copy link
Contributor

Hey @jodylent, thanks for the detailed report. You've got a super interesting use case and we'll be happy to explore solutions with you.

From the information you have shared, this is my understanding of how your metrics are being submitted and processed. In your lambda function you are calling lambda_metric and specifying a timestamp. This forces the function to use the datadog api (via https://github.com/DataDog/datadogpy) to submit metrics. At the same time, you also have the AWS integration installed for your Lambda functions. When crawled by the integration, the AWS Tags are gathered from your Lambda functions. These tags are then added to your custom metrics in addition to the metrics you specified in code.

Ideally, I think the best solution here is to use configuration options that already exist, rather than creating new ones. The tagging behavior as you've described it is not a bug, but is by design. Therefore, we need to find a solution that will ignore/remove specific tag values from your custom metrics.

Exclude tags from UI: From the Metrics Summary page, when you click on any of the metrics you need to amend, you'll see a list at the bottom of all the tags available. If you click "Manage Tags", you'll see the following dialog.

Image

From here, try excluding the tags you do not want. I do not think this will backfill your metrics, but will drop the unwanted tags going forward.

The drawback with this configuration method is it requires an all or nothing approach to tags. It does not allow you to drop tag values but instead drops an entire tag set (including all of its values).

@jodylent jodylent changed the title [BUG] Metric Tags with Multiple Values when Tag Keys match Function Tags Metric Tags with Multiple Values when Tag Keys match Function Tags Jan 9, 2025
@jodylent
Copy link
Author

jodylent commented Jan 9, 2025

Removed the [BUG] in the title -- it's a good point that this is the intended behavior of the (wrapper + AWS Lambda DD integration)

More Context

  • The Function(s) in question are known internally as DataDogMetricEmiters, or "DDMEs"
  • They have no prior knowledge of the metrics or tags which they'll emit, which makes it hard to preclude data in advance. As an example, a DDME might read specifically formatted json files and use them to create metrics. An upstream consumer could very easily inject new metrics/tags in such a file.
  • There may be a legitimate conflict between the tags on the DDME itself and those on the metrics -- for a hypothetical example, a DDME tagged business_unit:infrastructure, may emit metrics on behalf of something which explicitly tags those points business_unit:finance

Possible Solutions

  • Skip the HttpReporter middleman entirely, and call api.Distribution.send() directly
  • Handle conflicts ad hoc as they come up.
  • Do nothing and wait for the inevitable explosion when someone uses $some_key.value in a query because "the numbers were always right" and therefore the tags must have been (kidding) here.

@duncanista
Copy link
Contributor

@jodylent could you try this with v68 of the Extension? Just trying to figure out if the issue lies there, lmk!

@jodylent
Copy link
Author

with v68 of the Extension

Same experiment, same result.

Image

I think this is a feature, rather than a bug, as purple4reina suggested

  • If Func X emits metric.key, it's tagged functionname:x
  • On some internal period, the DD integration scrapes Functions in a given account
  • In some internal manner, it applies Function tags from X to all metric points tagged functionname:x
  • I'd love to speculate on exactly how that "scrape & append" backfill works, but I don't think the resulting behavior is necessarily incorrect

My conclusion is that if one needs control over the exact Tags on a given set of Metric points from a Function, one needs to send them directly, rather than use the wrapper or the extension.

Assuming we agree, we can close this issue (though it might be worthwhile to note the edge case in docs... somewhere?)

@purple4reina
Copy link
Contributor

Hey @jodylent two new ideas for you.

First and maybe easiest, instead of using the lambda_metric method, which uses the datadogpy package to aggregate and flush metrics, use the datadog_api_client package to submit metrics without aggregation. I believe this would solve your problem because it would allow you to be explicit in your tag selection, rather than relying on an aggregator. See https://datadoghq.dev/datadog-api-client-python/datadog_api_client.v2.api.html#datadog_api_client.v2.api.metrics_api.MetricsApi.submit_metrics

If you'd prefer to keep the metric aggregation and not have to add the new dependency to your function, I believe you should be able to edit the tags list on the aggregator directly. Manually create your own ThreadStatsWriter, then you clear the constant_tags list on the thread stats instance. For example:

import time

from datadog_lambda.thread_stats_writer import ThreadStatsWriter

stats = ThreadStatsWriter()
stats.thread_stats.constant_tags.clear()

def handler(event, context):
    stats.distribution('rey.kittens', 1, tags['color:purple'], timestamp=time.time())

As you're testing these things, let me know how they work out for you and what you ultimately decide to go for. We'd love to be able to help other customers facing this same situation.

@purple4reina
Copy link
Contributor

Hey @jodylent, hope these solutions worked out for you. Would love to hear which you chose and how it's going for you. I'm closing this issue for now, but you're more than welcome to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants