Skip to content

Potential Prometheus memory leak in ingesters #3282

@roystchiang

Description

@roystchiang

Describe the bug
We encountered an issue with Ingester/Prometheus unable to recover from WAL due to invalid block sequence: block time ranges overlap issue. Repeatedly calling this Ingester with ingestion request results in OOM.

When an ingester service starts up, it tries to recover TSDB for all UserIds. If a TSDB failed to be recovered, ingester will skip it and continue on. When an ingestion request for the corrupted TSDB comes, ingester will try to create a TSDB for this UserID. Since there is already a TSDB for this user, this action will result in Prometheus performing recovery once again.

I noticed that for the same request, ingesters log different errors

level=warn ts=2020-10-01T19:36:19.800983863Z caller=grpc_logging.go:38 method=/cortex.Ingester/Push duration=28.647562844s err="user=some-user-id: failed to open TSDB: /data/tsdb/some-user-id: invalid block sequence: block time ranges overlap: [mint: 1601488821086, maxt: 1601490567894, range: 29m6s, blocks: 2]: <ulid: 01EKG4DJNVRS9KSPQZHVHKD07G, mint: 1601481600000, maxt: 1601490567894, range: 2h29m27s>, <ulid: 01EKGQHG7FP40NRB4SA0GFAV4E, mint: 1601488821086, maxt: 1601496000000, range: 1h59m38s>" msg="gRPC\n"
level=warn ts=2020-10-01T19:15:41.737350414Z caller=grpc_logging.go:38 method=/cortex.Ingester/Push duration=13m46.278392928s err="user=some-user-id: failed to open TSDB: /data/tsdb/some-user-id: mmap files, file: /data/tsdb/some-user-id/chunks_head/000036: mmap: cannot allocate memory" msg="gRPC\n

I see the same chunk-head is memory-mapped about 200 times. I also saw that an ingester is able to handle about 200 requests for this corrupted TSDB before going OOM. I suspect Promethues is memory-mapping files during WAL replay. When Prometheus fails, this mmap is not cleaned up.

/ # sysctl vm.max_map_count
vm.max_map_count = 65530

/ # pmap 1 | wc -l
65523
/ # pmap 1 | grep /data/tsdb/some-user-id/chunks_head/000036 | wc -l
236

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex (v1.4.0-rc.1)
  2. Have a corrupted TSDB in ingester
  3. Perform write operations for the corrupted TSDB
  4. Ingester goes OOM

Expected behavior
Ingester should not go OOM

Environment:

  • Infrastructure: AWS EKS
  • Deployment tool: helm

Storage Engine

  • Blocks
  • Chunks

Additional Context

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions