Description
Describe the bug
Following the upgrade of Cortex from v1.17.1 to v1.18.0, the Store Gateway Pods are frequently encountering OOMKills. These events appear random, occurring approximately every 5 minutes, and have continued beyond the upgrade. Before the upgrade, memory usage consistently hovered around 4GB, with CPU usage under 1 core. However, after the upgrade, both CPU and memory usage have spiked to over 10 times their typical levels. Even after increasing the memory limit for the Store Gateway to 30GB, the issue persists. (see graph below)
We initially suspected the issue might be related to the sharding ring configurations, so we attempted to disable the following flags:
- store-gateway.sharding-ring.zone-awareness-enabled=False
- store-gateway.sharding-ring.zone-stable-shuffle-sharding=False
However, this did not resolve the problem.
CPU Graph: The far left shows usage before the upgrade, the middle represents usage during the upgrade, and the far right illustrates the rollback, where CPU usage returns to normal levels-
Memory Graph: The far left shows memory usage before the upgrade, the middle represents usage during the upgrade, and the far right reflects the rollback, where memory usage returns to normal levels-
To Reproduce
Steps to reproduce the behavior:
- Upgrade to Cortex v1.18.0 from v1.17.1 using the Cortex Helm Chart with the values in the Additional Context section.
Expected behavior
Store-GW shouldn't be OOMKilling.
Environment:
- Infrastructure: AKS(Kubernetes)
- Deployment tool: Cortex Helm Chart v2.3.0 or v2.4.0
Additional Context
Helm Chart Values Passed
useExternalConfig: true
image:
repository: redact
tag: v1.18.0
externalConfigVersion: x
ingress:
enabled: true
ingressClass:
enabled: true
name: nginx
hosts:
- host: cortex.redact
paths:
- /
tls:
- hosts:
- cortex.redact
serviceAccount:
create: true
automountServiceAccountToken: true
store_gateway:
replicas: 6
persistentVolume:
storageClass: premium
size: 64Gi
resources:
resources:
limits:
memory: 24Gi
requests:
memory: 18Gi
extraArgs:
blocks-storage.bucket-store.index-cache.memcached.max-async-buffer-size: "10000000"
blocks-storage.bucket-store.index-cache.memcached.max-get-multi-concurrency: "100"
blocks-storage.bucket-store.index-cache.memcached.max-get-multi-batch-size: "100"
blocks-storage.bucket-store.bucket-index.enabled: true
blocks-storage.bucket-store.index-header-lazy-loading-enabled: true
store-gateway.sharding-ring.zone-stable-shuffle-sharding: False
store-gateway.sharding-ring.zone-awareness-enabled: False
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: instance
compactor:
persistentVolume:
size: 256Gi
storageClass: premium
resources:
limits:
cpu: 4
memory: 10Gi
requests:
cpu: 1.5
memory: 5Gi
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: instance
extraArgs:
blocks-storage.bucket-store.bucket-index.enabled: true
nginx:
replicas: 3
image:
repository: redact
tag: 1.27.2-alpine-slim
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: instance
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
config:
verboseLogging: false
query_frontend:
replicas: 3
resources:
limits:
cpu: 1
memory: 5Gi
requests:
cpu: 200m
memory: 4Gi
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: instance
extraArgs:
querier.query-ingesters-within: 8h
querier:
replicas: 3
resources:
limits:
cpu: 8
memory: 26Gi
requests:
cpu: 1
memory: 20Gi
extraArgs:
querier.query-ingesters-within: 8h
querier.max-fetched-data-bytes-per-query: "2147483648"
querier.max-fetched-chunks-per-query: "1000000"
querier.max-fetched-series-per-query: "200000"
querier.max-samples: "50000000"
blocks-storage.bucket-store.bucket-index.enabled: true
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: instance
ingester:
statefulSet:
enabled: true
replicas: 18
persistentVolume:
enabled: true
size: 64Gi
storageClass: premium
resources:
limits:
cpu: 8
memory: 45Gi
requests:
cpu: 8
memory: 40Gi
extraArgs:
ingester.max-metadata-per-user: "50000"
ingester.max-series-per-metric: "200000"
ingester.instance-limits.max-series: "0"
ingester.ignore-series-limit-for-metric-names: "redact"
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: instance
ruler:
validation:
enabled: false
replicas: 3
resources:
limits:
cpu: 2
memory: 6Gi
requests:
cpu: 500m
memory: 3Gi
sidecar:
image:
repository: redact
tag: 1.28.0
resources:
limits:
cpu: 1
memory: 200Mi
requests:
cpu: 50m
memory: 100Mi
enabled: true
searchNamespace: cortex-rules
folder: /tmp/rules
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: instance
extraArgs:
blocks-storage.bucket-store.bucket-index.enabled: true
querier.max-fetched-chunks-per-query: "2000000"
alertmanager:
enabled: true
replicas: 3
podAnnotations:
configmap.reloader.stakater.com/reload: "redact"
statefulSet:
enabled: true
persistentVolume:
size: 8Gi
storageClass: premium
sidecar:
image:
repository: redact
tag: 1.28.0
containerSecurityContext:
enabled: true
runAsUser: 0
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 50m
memory: 100Mi
enabled: true
searchNamespace: cortex-alertmanager
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: instance
distributor:
resources:
limits:
cpu: 4
memory: 10Gi
requests:
cpu: 2
memory: 10Gi
extraArgs:
distributor.ingestion-rate-limit: "120000"
validation.max-label-names-per-series: 40
distributor.ha-tracker.enable-for-all-users: true
distributor.ha-tracker.enable: true
distributor.ha-tracker.failover-timeout: 30s
distributor.ha-tracker.cluster: "prometheus"
distributor.ha-tracker.replica: "prometheus_replica"
distributor.ha-tracker.consul.hostname: consul.cortex:8500
distributor.instance-limits.max-ingestion-rate: "120000"
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: instance
autoscaling:
minReplicas: 15
maxReplicas: 30
memcached-frontend:
enabled: true
image:
registry: redact
repository: redact/memcached-bitnami
tag: redact
commonLabels:
release: kube-prometheus-stack
podManagementPolicy: OrderedReady
metrics:
enabled: true
image:
registry: redact
repository: redact/memcached-exporter-bitnami
tag: redact
serviceMonitor:
enabled: true
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: instance
resources:
requests:
memory: 1Gi
cpu: 1
limits:
memory: 1.5Gi
cpu: 1
args:
- /run.sh
- -I 32m
serviceAccount:
create: true
memcached-blocks-index:
enabled: true
image:
registry: redact
repository: redact/memcached-bitnami
tag: redact
commonLabels:
release: kube-prometheus-stack
podManagementPolicy: OrderedReady
metrics:
enabled: true
image:
registry: redact
repository: redact/memcached-exporter-bitnami
tag: redact
serviceMonitor:
enabled: true
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: instance
resources:
requests:
memory: 1Gi
cpu: 1
limits:
memory: 1.5Gi
cpu: 1.5
args:
- /run.sh
- -I 32m
serviceAccount:
create: true
memcached-blocks:
enabled: true
image:
registry: redact
repository: redact/memcached-bitnami
tag: redact
commonLabels:
release: kube-prometheus-stack
podManagementPolicy: OrderedReady
metrics:
enabled: true
image:
registry: redact
repository: redact/memcached-exporter-bitnami
tag: redact
serviceMonitor:
enabled: true
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: instance
resources:
requests:
memory: 2Gi
cpu: 1
limits:
memory: 3Gi
cpu: 1
args:
- /run.sh
- -I 32m
serviceAccount:
create: true
memcached-blocks-metadata:
enabled: true
image:
registry: redact
repository: redact/memcached-bitnami
tag: redact
commonLabels:
release: kube-prometheus-stack
podManagementPolicy: OrderedReady
metrics:
enabled: true
image:
registry: redact
repository: redact/memcached-exporter-bitnami
tag: redact
serviceMonitor:
enabled: true
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: instance
resources:
requests:
memory: 1Gi
cpu: 1
limits:
memory: 1.5Gi
cpu: 1
args:
- /run.sh
- -I 32m
serviceAccount:
create: true
runtimeconfigmap:
create: true
annotations: {}
runtime_config: {}
Quick PPROF of Store GW
curl -s http://localhost:8080/debug/pprof/heap > heap.out
go tool pprof heap.out
top
Showing nodes accounting for 622.47MB, 95.80% of 649.78MB total
Dropped 183 nodes (cum <= 3.25MB)
Showing top 10 nodes out of 49
flat flat% sum% cum cum%
365.95MB 56.32% 56.32% 365.95MB 56.32% github.com/thanos-io/thanos/pkg/block/indexheader.(*BinaryReader).init.func3
127.94MB 19.69% 76.01% 528.48MB 81.33% github.com/thanos-io/thanos/pkg/block/indexheader.(*BinaryReader).init
76.30MB 11.74% 87.75% 76.30MB 11.74% github.com/thanos-io/thanos/pkg/cacheutil.NewAsyncOperationProcessor
34.59MB 5.32% 93.07% 34.59MB 5.32% github.com/prometheus/prometheus/tsdb/index.NewSymbols