Skip to content

out-of-order labelset in compactor #5419

Closed
@nschad

Description

@nschad

out-of-order Labelset in compactor

We have noticed that sometimes (since a couple of days) blocks appear with "out-of-order" errors in the compactor log. Example below. We had different and seemingly random errors, affecting different labelsets and different metrics. Sometimes labels were duped and literally appearing twice in the same set with different values. In this example you can see that, what is supposed to be beta_kubernetes_instance_type, is completely corrupted. Because of this corruption the required sorting of the labels in the set is wrong, thus the error. This is also the only occurence in the block. Running tsdb analyze shows that other metrics do not have this buggy label.

msg="out-of-order label set: known bug in Prometheus 2.8.0 and below" labelset="{__name__=\"container_memory_usage_bytes\", beta_kubernetes_io_arch=\"amd64\", beta_kuber\u0000(\ufffd@\u0010\ufffdౡ\ufffd1stance_type=\"c1.2\", beta_kubernetes_io_os=\"linux\", ....}

Additionally the Grafana Metric Browser dashboard sometimes shows labels like these. Which is interesting to me since the labels there are fetched through the prometheus /api/v1/labels API, right? For me it means the problem already exists in the ingesters and we can remove potential data corruption at the s3 layer.
image

To Reproduce
Steps to reproduce the behavior:

  1. Unknown

Expected behavior
non corrupted data

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: Helm
  • Cortex Version: 1.15.1

Additional Context
Also we don't run any prometheus below 2.35

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions