Skip to content

[observability] most basic OpenTelemetry integration into MCK #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nammn
Copy link
Collaborator

@nammn nammn commented May 7, 2025

Summary

This pull request introduces OpenTelemetry tracing support to the MongoDB Kubernetes Operator and its related components. Key changes include the integration of OpenTelemetry libraries, the addition of tracing configuration, and updates to ensure trace propagation across the application. These changes enhance observability and debugging capabilities.

In our CI suite this means we will have the following kind of traces:

trace_id: abc123

                         ┌────────────────────┐
                         │     Evergreen      │
                         │  span_id: ROOT     │
                         │  parent_id: none   │
                         └─────────┬──────────┘
                                   │
        ┌──────────────────────────┼─────────────────────────┐
        ▼                          ▼                         ▼
┌──────────────┐         ┌────────────────┐         ┌────────────────────┐
│   E2E Test   │         │   Operator     │         │     (Other…)       │
│ span_id: A1  │         │ span_id: B1    │         │                    │
│ parent: ROOT │         │ parent: ROOT   │         │                    │
└──────┬───────┘         └──────┬─────────┘         └────────────────────┘
       │                        │
       ▼                        ▼
┌──────────────┐         ┌────────────────────┐
│ E2E Function │         │   Reconcile Loop   │
│ span_id: A2  │         │   span_id: B2      │
│ parent: A1   │         │   parent: B1       │
└──────────────┘         └────────────────────┘

OpenTelemetry Integration:

  • Tracing in main.go:
    • Added OpenTelemetry setup in the main function, including trace and span ID extraction from environment variables and the creation of a root span for the operator. Tracing context is propagated across controllers and shutdown processes are handled gracefully.
  • Telemetry in pkg/telemetry/client.go: <--- this is good to know if we happen to make a change and happen to send to prod atlas
    • Added a span to the SendEventWithRetry function to capture telemetry events and include the Atlas base URL as a span attribute.

Helm Chart Updates:

  • Operator configuration:
    • Added OpenTelemetry-specific environment variables (OTEL_TRACE_ID, OTEL_PARENT_ID, OTEL_EXPORTER_OTLP_ENDPOINT) to the operator's deployment template. ([helm_chart/templates/operator.yamlR83-R90](https://github.com/mongodb/mongodb-kubernetes/pull/93/files#diff-5d2e377a6806023ca9eff60be4d7e5cd879803de2bd3800b630f479f8728f322R83-R90))
    • Introduced OpenTelemetry configuration options (enabled, traceID, parentID, collectorEndpoint) in the Helm chart's values.yaml.

Dependency Updates:

  • Go module dependencies:
    • Added OpenTelemetry-related libraries (otel, otel/sdk, otel/trace, etc.) to go.mod.

Proof of Work

  • e.g. patch

  • generated traces in our ci: Link
    Screenshot 2025-05-21 at 15 19 20

Checklist

  • Have you linked a jira ticket and/or is the ticket in the title?
  • Have you checked whether your jira ticket required DOCSP changes?
  • Have you checked for release_note changes?

Reminder (Please remove this when merging)

  • Please try to Approve or Reject Changes the PR, keep PRs in review as short as possible
  • Our Short Guide for PRs: Link
  • Remember the following Communication Standards - use comment prefixes for clarity:
    • blocking: Must be addressed before approval.
    • follow-up: Can be addressed in a later PR or ticket.
    • q: Clarifying question.
    • nit: Non-blocking suggestions.
    • note: Side-note, non-actionable. Example: Praise
    • --> no prefix is considered a question

@nammn nammn changed the title add initial operator tracing support OpenTelemetry integration into MCK May 7, 2025
@nammn nammn force-pushed the traces-operator branch from 19334f0 to 388f4ad Compare May 21, 2025 08:00
@nammn nammn changed the title OpenTelemetry integration into MCK most basic OpenTelemetry integration into MCK May 21, 2025
@nammn nammn changed the title most basic OpenTelemetry integration into MCK [observability] most basic OpenTelemetry integration into MCK May 21, 2025
@nammn nammn force-pushed the traces-operator branch 3 times, most recently from 1b6922c to 3420830 Compare May 21, 2025 08:45
- name: OTEL_SERVICE_NAME
value: evergreen-agent
value: mongodb-e2e-tests
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the service-name for better querying

reset_namespace "$(kubectl config current-context)" "${NAMESPACE}" || true
fi
# If the test passed, then the namespace is removed
delete_operator "${NAMESPACE}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should remove the operator to test teardown at the end of the test-run + it enables us to ensure traces are exported in time

@nammn nammn force-pushed the traces-operator branch from 2391b87 to b75faef Compare May 21, 2025 13:22
@nammn nammn marked this pull request as ready for review May 21, 2025 13:23
@nammn nammn requested a review from a team as a code owner May 21, 2025 13:23
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces basic OpenTelemetry tracing support into the MongoDB Kubernetes Operator to improve observability and debugging. Key changes include the integration of OpenTelemetry libraries and configuration in main.go and pkg/telemetry, updates to Helm chart values and templates for trace propagation, and accompanying changes in deployment and test scripts to ensure correct trace information is passed.

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
scripts/funcs/operator_deployment Adds OpenTelemetry environment variable handling to operator deployment configuration.
scripts/funcs/kubernetes Updates operator naming and adjusts deployment deletion commands.
scripts/evergreen/e2e/e2e.sh Introduces cluster diagnostics and cleanup functions for enhanced troubleshooting.
scripts/evergreen/deployments/test-app/templates/mongodb-enterprise-tests.yaml Modifies environment variables to set the correct service name in traces.
pkg/telemetry/* Implements tracing setup, span creation, and propagation via new OpenTelemetry integrations.
pipeline.py Updates comment on trace flags to reflect sampling behavior.
main.go Integrates tracing setup with graceful shutdown and propagates trace context to controllers and webhooks.
helm_chart/* Adds OpenTelemetry configuration options to Helm chart values and templates.
go.mod Introduces OpenTelemetry dependencies.
docker/mongodb-kubernetes-tests/tests/conftest.py Replaces print with logger and corrects a spelling mistake in a comment.
LICENSE-THIRD-PARTY Updates third-party dependency list to include new OpenTelemetry libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant