-
Notifications
You must be signed in to change notification settings - Fork 195
Ingest internal telemetry from the OTel Collector when it is running #9928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
8f61046
Add otel telemetry to agent monitoring data
faec 3cb7ad0
add remaining standard processors
faec a05982e
Handle custom telemetry endpoint configurations
faec 64dab48
Merge branch 'main' of github.com:elastic/elastic-agent into otel-tel…
faec 415aa0a
Merge branch 'main' of github.com:elastic/elastic-agent into otel-tel…
faec 815e444
Testing remap script
faec 9c95ee8
Merge branch 'otel-telemetry' of github.com:faec/elastic-agent into o…
faec 9bb0b09
adjusting event fields
faec c591ccc
Update field names / semantics
faec a77e355
fix telemetry label scope
faec 6c37528
add comment
faec 3e8da6a
Fix remaining component / metricset fields
faec 922979c
Mangle agent id to prevent rejection of metrics from different label …
faec 80841a2
fix exporter label
faec 940c046
Remove debug code
faec 2772b3f
Merge branch 'main' of github.com:elastic/elastic-agent into otel-tel…
faec b633459
Rework otel config parsing to tolerate absent config
faec c8c8641
Check for nil otel config
faec a3ab177
make check
faec e681979
replace custom parsing with stock otel config struct
faec fdfd694
add changelog fragment
faec 4257b29
mage check
faec c5a7db7
mage notice
faec 085b371
update TestMonitoringFull golden file
faec ae03c29
integration test: there are now 4 monitoring components instead of 3
faec File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
32 changes: 32 additions & 0 deletions
32
changelog/fragments/1759257958-collector-telemetry-monitoring.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Kind can be one of: | ||
# - breaking-change: a change to previously-documented behavior | ||
# - deprecation: functionality that is being removed in a later release | ||
# - bug-fix: fixes a problem in a previous version | ||
# - enhancement: extends functionality but does not break or fix existing behavior | ||
# - feature: new functionality | ||
# - known-issue: problems that we are aware of in a given version | ||
# - security: impacts on the security of a product or a user’s deployment. | ||
# - upgrade: important information for someone upgrading from a prior version | ||
# - other: does not fit into any of the other categories | ||
kind: enhancement | ||
|
||
# Change summary; a 80ish characters long description of the change. | ||
summary: Include OTel Collector internal telemetry in Agent monitoring | ||
|
||
# Long description; in case the summary is not enough to describe the change | ||
# this field accommodate a description without length limits. | ||
# NOTE: This field will be rendered only for breaking-change and known-issue kinds at the moment. | ||
#description: | ||
|
||
# Affected component; usually one of "elastic-agent", "fleet-server", "filebeat", "metricbeat", "auditbeat", "all", etc. | ||
component: elastic-agent | ||
|
||
# PR URL; optional; the PR number that added the changeset. | ||
# If not present is automatically filled by the tooling finding the PR where this changelog fragment has been added. | ||
# NOTE: the tooling supports backports, so it's able to fill the original PR number instead of the backport PR number. | ||
# Please provide it if you are adding a fragment for a different PR. | ||
#pr: https://github.com/owner/repo/1234 | ||
|
||
# Issue URL; optional; the GitHub issue related to this changeset (either closes or is part of). | ||
# If not present is automatically filled by the tooling with the issue linked to the PR number. | ||
#issue: https://github.com/owner/repo/1234 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
139 changes: 139 additions & 0 deletions
139
internal/pkg/agent/application/monitoring/component/otel_remap.js
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
// A script for use in the Beats script processor, to remap raw OTel telemetry | ||
// from its prometheus endpoint to backwards-compatible Beats metrics fields | ||
// that can be viewed in Agent dashboards. | ||
|
||
function process(event) { | ||
// This hard-coded exporter name will not work for the general | ||
// (non-monitoring) use case. | ||
var elastic_exporter = event.Get("prometheus.labels.exporter") == "elasticsearch/_agent-component/monitoring"; | ||
var elastic_scope = event.Get("prometheus.labels.otel_scope_name") == "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter"; | ||
|
||
// We accept general collector fields that are scoped to the elasticsearch | ||
// exporter (queue metrics, sent / error stats), or fields specifically | ||
// scoped to the elasticsearch exporter (custom elastic metrics). | ||
if (!elastic_exporter && !elastic_scope) { | ||
event.Cancel(); | ||
return; | ||
} | ||
|
||
// Hack: if the scope is elastic-custom fields, deterministically mangle the | ||
// agent.id. Since the label set is different, these are passed through in | ||
// different events, and if we don't do this one of the events will be | ||
// rejected as a duplicate since they have the same component id, agent id, | ||
// and metricset. | ||
var id = event.Get("agent.id"); | ||
if (id != null && id.length > 0) { | ||
// Increment / wrap the last hex character of the uuid | ||
var prefix = id.substring(0, id.length - 1); | ||
var last = id.substring(id.length - 1); | ||
var rotated = "0"; | ||
if (last < "f") { | ||
rotated = String.fromCharCode(last.charCodeAt(0) + 1); | ||
} | ||
id = prefix + rotated; | ||
event.Put("agent.id", id); | ||
} | ||
|
||
// The event will be discarded unless we find some valid metric to convert. | ||
var keep_event = false; | ||
|
||
var queue_size = event.Get("prometheus.metrics.otelcol_exporter_queue_size"); | ||
var queue_capacity = event.Get("prometheus.metrics.otelcol_exporter_queue_capacity"); | ||
if (queue_size != null) { | ||
keep_event = true; | ||
event.Put("beat.stats.libbeat.pipeline.queue.filled.events", queue_size); | ||
} | ||
if (queue_capacity != null) { | ||
keep_event = true; | ||
event.Put("beat.stats.libbeat.pipeline.queue.max_events", queue_capacity); | ||
} | ||
if (queue_size != null && queue_capacity != null) { | ||
var queue_pct = queue_size / queue_capacity; | ||
if (!isNaN(queue_pct)) { | ||
event.Put("beat.stats.libbeat.pipeline.queue.filled.pct", queue_pct); | ||
} | ||
} | ||
|
||
var total_sent = 0; | ||
var total_sent_valid = false; | ||
// Add send statistics from all source types | ||
var sent_logs = event.Get("prometheus.metrics.otelcol_exporter_sent_log_records_total"); | ||
if (sent_logs != null) { | ||
total_sent += sent_logs; | ||
total_sent_valid = true; | ||
} | ||
var sent_spans = event.Get("prometheus.metrics.otelcol_exporter_sent_spans_total"); | ||
if (sent_spans != null) { | ||
total_sent += sent_spans; | ||
total_sent_valid = true; | ||
} | ||
var sent_metrics = event.Get("prometheus.metrics.otelcol_exporter_sent_metric_points_total"); | ||
if (sent_metrics != null) { | ||
total_sent += sent_metrics; | ||
total_sent_valid = true; | ||
} | ||
if (total_sent_valid) { | ||
event.Put("beat.stats.libbeat.output.events.acked", total_sent); | ||
keep_event = true; | ||
} | ||
|
||
var total_failed = 0; | ||
var total_failed_valid = false; | ||
// Add failed statistics from all source types | ||
var failed_logs = event.Get("prometheus.metrics.otelcol_exporter_send_failed_log_records_total"); | ||
if (failed_logs != null) { | ||
total_failed += failed_logs; | ||
total_failed_valid = true; | ||
} | ||
var failed_spans = event.Get("prometheus.metrics.otelcol_exporter_send_failed_spans_total"); | ||
if (failed_spans != null) { | ||
total_failed += failed_spans; | ||
total_failed_valid = true; | ||
} | ||
var failed_metrics = event.Get("prometheus.metrics.otelcol_exporter_send_failed_metric_points_total"); | ||
if (failed_metrics != null) { | ||
total_failed += failed_metrics; | ||
total_failed_valid = true; | ||
} | ||
if (total_failed_valid) { | ||
event.Put("beat.stats.libbeat.output.events.dropped", total_failed); | ||
keep_event = true; | ||
} | ||
|
||
var flushed_bytes = event.Get("prometheus.metrics.otelcol_elasticsearch_flushed_bytes_total"); | ||
if (flushed_bytes != null) { | ||
event.Put("beat.stats.libbeat.output.write.bytes", flushed_bytes); | ||
keep_event = true; | ||
} | ||
|
||
var retried_docs = event.Get("prometheus.metrics.otelcol_elasticsearch_docs_retried_ratio_total"); | ||
if (retried_docs != null) { | ||
// "failed" in the beats metric means an event failed to ingest but was | ||
// not dropped, and will be retried. | ||
event.Put("beat.stats.libbeat.output.events.failed", retried_docs); | ||
keep_event = true; | ||
} | ||
|
||
var request_count = event.Get("prometheus.metrics.otelcol_elasticsearch_bulk_requests_count_ratio_total"); | ||
if (request_count != null) { | ||
// This is not an exact semantic match for how Beats measures batch count, | ||
// but it's close. | ||
event.Put("beat.stats.libbeat.output.events.batches", request_count); | ||
keep_event = true; | ||
} | ||
|
||
var processed_docs_count = event.Get("prometheus.metrics.otelcol_elasticsearch_docs_processed_ratio_total"); | ||
if (processed_docs_count != null) { | ||
// Approximate semantic match: the otel metric counts all document | ||
// ingestion attempts, including success, failure, and retries, | ||
// which is a better match for the Beats definition of total events | ||
// than otelcol_elasticsearch_docs_received_ratio_total which | ||
// includes only unique events seen (regardless of retries etc). | ||
event.Put("beat.stats.libbeat.output.events.total", processed_docs_count); | ||
keep_event = true; | ||
} | ||
|
||
if (!keep_event) { | ||
event.Cancel(); | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.