Potential Prometheus memory leak in ingesters

**Describe the bug**
We encountered an issue with Ingester/Prometheus unable to recover from WAL due to `invalid block sequence: block time ranges overlap` issue. Repeatedly calling this Ingester with ingestion request results in OOM.

When an ingester service starts up, it tries to recover TSDB for all UserIds. If a TSDB failed to be recovered, [ingester will skip it and continue on](https://github.com/cortexproject/cortex/blob/012fc0b0fa2fce785388917959b1a8168d29d809/pkg/ingester/ingester_v2.go#L1103). When an ingestion request for the corrupted TSDB comes, ingester will try to create a TSDB for this UserID. Since there is already a TSDB for this user, this action will result in [Prometheus performing recovery once again](https://github.com/cortexproject/cortex/blob/012fc0b0fa2fce785388917959b1a8168d29d809/pkg/ingester/ingester_v2.go#L404). 

I noticed that for the same request, ingesters log different errors

```
level=warn ts=2020-10-01T19:36:19.800983863Z caller=grpc_logging.go:38 method=/cortex.Ingester/Push duration=28.647562844s err="user=some-user-id: failed to open TSDB: /data/tsdb/some-user-id: invalid block sequence: block time ranges overlap: [mint: 1601488821086, maxt: 1601490567894, range: 29m6s, blocks: 2]: <ulid: 01EKG4DJNVRS9KSPQZHVHKD07G, mint: 1601481600000, maxt: 1601490567894, range: 2h29m27s>, <ulid: 01EKGQHG7FP40NRB4SA0GFAV4E, mint: 1601488821086, maxt: 1601496000000, range: 1h59m38s>" msg="gRPC\n"
```

```
level=warn ts=2020-10-01T19:15:41.737350414Z caller=grpc_logging.go:38 method=/cortex.Ingester/Push duration=13m46.278392928s err="user=some-user-id: failed to open TSDB: /data/tsdb/some-user-id: mmap files, file: /data/tsdb/some-user-id/chunks_head/000036: mmap: cannot allocate memory" msg="gRPC\n
```

I see the same chunk-head is memory-mapped about 200 times. I also saw that an ingester is able to handle about 200 requests for this corrupted TSDB before going OOM. I suspect Promethues is memory-mapping files during WAL replay. When Prometheus fails, this mmap is not cleaned up.

```
/ # sysctl vm.max_map_count
vm.max_map_count = 65530

/ # pmap 1 | wc -l
65523
```

```
/ # pmap 1 | grep /data/tsdb/some-user-id/chunks_head/000036 | wc -l
236
```

**To Reproduce**
Steps to reproduce the behavior:
1. Start Cortex (v1.4.0-rc.1)
2. Have a corrupted TSDB in ingester
3. Perform write operations for the corrupted TSDB
4. Ingester goes OOM

**Expected behavior**
Ingester should not go OOM

**Environment:**
 - Infrastructure: AWS EKS
 - Deployment tool: helm

**Storage Engine**
- [x] Blocks
- [ ] Chunks

**Additional Context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential Prometheus memory leak in ingesters #3282

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential Prometheus memory leak in ingesters #3282

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions