-
Notifications
You must be signed in to change notification settings - Fork 10
[observability] most basic OpenTelemetry integration into MCK #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
1b6922c
to
3420830
Compare
- name: OTEL_SERVICE_NAME | ||
value: evergreen-agent | ||
value: mongodb-e2e-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the service-name for better querying
scripts/evergreen/e2e/e2e.sh
Outdated
reset_namespace "$(kubectl config current-context)" "${NAMESPACE}" || true | ||
fi | ||
# If the test passed, then the namespace is removed | ||
delete_operator "${NAMESPACE}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should remove the operator to test teardown at the end of the test-run + it enables us to ensure traces are exported in time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very, very nice change and I would like to have it included. But as of right now I think we should discuss this PR a bit more before deciding to merge. There are too many end-user implications to just LGTM it now.
helm uninstall --kube-context="${context}" mongodb-enterprise-operator || true & | ||
helm uninstall --kube-context="${context}" mongodb-community-operator || true & | ||
helm uninstall --kube-context="${context}" mongodb-enterprise-operator-multi-cluster || true & | ||
helm uninstall --kube-context="${context}" mongodb-kubernetes-operator || true & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have one operator now - this is a cleanup
@@ -162,7 +160,7 @@ reset_namespace() { | |||
# a while to delete it. | |||
should_wait="false" | |||
# shellcheck disable=SC2153 | |||
if [[ ${CURRENT_VARIANT_CONTEXT} == e2e_mdb_openshift_ubi_cloudqa || ${CURRENT_VARIANT_CONTEXT} == e2e_openshift_static_mdb_ubi_cloudqa ]]; then | |||
if [[ ${KUBE_ENVIRONMENT_NAME} == "openshift" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanup
@@ -35,11 +35,10 @@ EOF | |||
|
|||
delete_operator() { | |||
local ns="$1" | |||
local name=${OPERATOR_NAME:=mongodb-enterprise-operator} | |||
local name=${OPERATOR_NAME:=mongodb-kubernetes-operator} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was not running before and therefore we had no flushing
|
||
title "Removing the Operator deployment ${name}" | ||
! kubectl --namespace "${ns}" get deployments | grep -q "${name}" \ | ||
|| kubectl delete deployment "${name}" -n "${ns}" || true | ||
kubectl delete deployment "${name}" -n "${ns}" --wait=true --timeout=10s|| true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's wait until all things are flushed before we stop. PoW is in the PR description - you can see the trace of the oeprator
@@ -83,6 +83,16 @@ spec: | |||
valueFrom: | |||
fieldRef: | |||
fieldPath: metadata.namespace | |||
{{- $opentelemetry := default dict .Values.operator.opentelemetry }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pow:
(venv) ~/projects/ops-manager-kubernetes git:[traces-operator]
helm template helm_chart | rg OTEL
(venv) ~/projects/ops-manager-kubernetes git:[traces-operator]
helm template --set operator.opentelemetry.tracing.enabled=true --set operator.opentelemetry.tracing.traceID=your-trace-id --set operator.opentelemetry.tracing.parentID=your-parent-id --set operator.opentelemetry.tracing.collectorEndpoint=http://jaeger:14268/api/traces helm_chart | rg OTEL
- name: OTEL_TRACE_ID
- name: OTEL_PARENT_ID
- name: OTEL_EXPORTER_OTLP_ENDPOINT
(venv) ~/projects/ops-manager-kubernetes git:[traces-operator]
helm template --set operator.opentelemetry.tracing.enabled=true --set operator.opentelemetry.tracing.traceID=your-trace-id --set operator.opentelemetry.tracing.parentID=your-parent-id --set operator.opentelemetry.tracing.collectorEndpoint=http://jaeger:14268/api/traces helm_chart | rg OTEL -C 2
fieldRef:
fieldPath: metadata.namespace
- name: OTEL_TRACE_ID
value: "your-trace-id"
- name: OTEL_PARENT_ID
value: "your-parent-id"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://jaeger:14268/api/traces"
- name: WATCH_NAMESPACE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Introduces basic OpenTelemetry tracing into the MongoDB Kubernetes Operator (MCK) and related scripts, enabling trace propagation from CI tests through operator spans.
- Initializes tracing in
main.go
using OTEL env vars and creates a root operator span. - Instruments core telemetry functions (
trace.go
,configmap.go
,collector.go
,client.go
) with spans and attributes. - Propagates OTEL settings through Helm charts, shell scripts, and test configurations.
Reviewed Changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
scripts/funcs/operator_deployment | Adds parsing of OTEL env vars to Helm values |
scripts/funcs/kubernetes | Updates default operator name and cleanup/uninstall logic |
scripts/evergreen/e2e/e2e.sh | Refactors cluster diagnostics and OpenShift cleanup sequences |
scripts/evergreen/deployments/test-app/templates/mongodb-enterprise-tests.yaml | Sets OTEL_SERVICE_NAME and pytest --trace-parent flags |
pkg/telemetry/trace.go | Implements SetupTracingFromParent with OTLP exporter |
pkg/telemetry/configmap.go | Wraps ConfigMap creation in a tracing span |
pkg/telemetry/collector.go | Adds a span around RunTelemetry |
pkg/telemetry/client.go | Adds a span in SendEventWithRetry with Atlas base URL |
pipeline.py | Clarifies trace_flags comment |
main.go | Hooks up tracing setup, root span, and tracer shutdown |
helm_chart/templates/operator.yaml | Injects OTEL env vars into the operator deployment |
go.mod | Adds OpenTelemetry dependencies |
docker/mongodb-kubernetes-tests/tests/conftest.py | Uses logger.debug instead of print and fixes a typo |
LICENSE-THIRD-PARTY | Updates third-party license entries for new dependencies |
Comments suppressed due to low confidence (1)
pkg/telemetry/client.go:118
- No unit tests cover the new OpenTelemetry instrumentation in
SendEventWithRetry
; consider adding tests to validate span creation and attribute setting.
_, span := TRACER.Start(ctx, "SendEventWithRetry")
pkg/telemetry/configmap.go
Outdated
@@ -82,6 +83,13 @@ func updateConfigMapWithNewUUID(ctx context.Context, k8sClient kubeclient.Client | |||
|
|||
// Creates a new ConfigMap with a generated UUID | |||
func createNewConfigMap(ctx context.Context, k8sClient kubeclient.Client, namespace string) string { | |||
_, span := TRACER.Start(ctx, "createNewConfigMap") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider calling span.RecordError(err)
inside the error branch of the ConfigMap creation to capture failures in the trace.
Copilot uses AI. Check for mistakes.
@@ -149,7 +149,7 @@ fi | |||
dump_cluster_information | |||
|
|||
# We only have static clusters in OpenShift; otherwise, there's no need to mark and clean them up here. | |||
if [[ "${CLUSTER_TYPE}" == "openshift" ]]; then | |||
if [[ "${KUBE_ENVIRONMENT_NAME}" == *openshift* ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KUBE_ENVIRONMENT_NAME="dev-openshift-cluster"
if [[ "${KUBE_ENVIRONMENT_NAME}" == *openshift* ]]; then
echo "Contains openshift"
else
echo "Does NOT contain openshift"
fi
-> Contains openshift
@@ -69,9 +69,6 @@ get_operator_helm_values() { | |||
comma_separated_list="$(echo "${MEMBER_CLUSTERS}" | tr ' ' ',')" | |||
# shellcheck disable=SC2154 | |||
config+=("multiCluster.clusters={${comma_separated_list}}") | |||
fi | |||
|
|||
if [[ "${KUBE_ENVIRONMENT_NAME:-}" == "multi" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason it was duplicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! LGTM!
Summary
This pull request introduces OpenTelemetry tracing support to the MongoDB Kubernetes Operator and its related components. Key changes include the integration of OpenTelemetry libraries, the addition of tracing configuration, and updates to ensure trace propagation across the application. These changes enhance observability and debugging capabilities.
In our CI suite this means we will have the following kind of traces:
OpenTelemetry Integration:
main.go
:main
function, including trace and span ID extraction from environment variables and the creation of a root span for the operator. Tracing context is propagated across controllers and shutdown processes are handled gracefully.pkg/telemetry/client.go
: <--- this is good to know if we happen to make a change and happen to send to prod atlasSendEventWithRetry
function to capture telemetry events and include the Atlas base URL as a span attribute.Helm Chart Updates:
OTEL_TRACE_ID
,OTEL_PARENT_ID
,OTEL_EXPORTER_OTLP_ENDPOINT
) to the operator's deployment template. ([helm_chart/templates/operator.yamlR83-R90](https://github.com/mongodb/mongodb-kubernetes/pull/93/files#diff-5d2e377a6806023ca9eff60be4d7e5cd879803de2bd3800b630f479f8728f322R83-R90)
)enabled
,traceID
,parentID
,collectorEndpoint
) in the Helm chart'svalues.yaml
.Dependency Updates:
otel
,otel/sdk
,otel/trace
, etc.) togo.mod
.Proof of Work
e.g. patch
generated traces in our ci: Link

Checklist
Reminder (Please remove this when merging)