Description
out-of-order Labelset in compactor
We have noticed that sometimes (since a couple of days) blocks appear with "out-of-order" errors in the compactor log. Example below. We had different and seemingly random errors, affecting different labelsets and different metrics. Sometimes labels were duped and literally appearing twice in the same set with different values. In this example you can see that, what is supposed to be beta_kubernetes_instance_type
, is completely corrupted. Because of this corruption the required sorting of the labels in the set is wrong, thus the error. This is also the only occurence in the block. Running tsdb analyze shows that other metrics do not have this buggy label.
msg="out-of-order label set: known bug in Prometheus 2.8.0 and below" labelset="{__name__=\"container_memory_usage_bytes\", beta_kubernetes_io_arch=\"amd64\", beta_kuber\u0000(\ufffd@\u0010\ufffdౡ\ufffd1stance_type=\"c1.2\", beta_kubernetes_io_os=\"linux\", ....}
Additionally the Grafana Metric Browser dashboard sometimes shows labels like these. Which is interesting to me since the labels there are fetched through the prometheus /api/v1/labels
API, right? For me it means the problem already exists in the ingesters and we can remove potential data corruption at the s3 layer.
To Reproduce
Steps to reproduce the behavior:
- Unknown
Expected behavior
non corrupted data
Environment:
- Infrastructure: Kubernetes
- Deployment tool: Helm
- Cortex Version: 1.15.1
Additional Context
Also we don't run any prometheus below 2.35