Skip to content

[BUG] Cortex v1.18.0 Upgrade Causing OOMKills and CPU Spikes in Store-Gateway #6259

Open
@dpericaxon

Description

@dpericaxon

Describe the bug
Following the upgrade of Cortex from v1.17.1 to v1.18.0, the Store Gateway Pods are frequently encountering OOMKills. These events appear random, occurring approximately every 5 minutes, and have continued beyond the upgrade. Before the upgrade, memory usage consistently hovered around 4GB, with CPU usage under 1 core. However, after the upgrade, both CPU and memory usage have spiked to over 10 times their typical levels. Even after increasing the memory limit for the Store Gateway to 30GB, the issue persists. (see graph below)

We initially suspected the issue might be related to the sharding ring configurations, so we attempted to disable the following flags:

  • store-gateway.sharding-ring.zone-awareness-enabled=False
  • store-gateway.sharding-ring.zone-stable-shuffle-sharding=False
    However, this did not resolve the problem.

CPU Graph: The far left shows usage before the upgrade, the middle represents usage during the upgrade, and the far right illustrates the rollback, where CPU usage returns to normal levels-
image

Memory Graph: The far left shows memory usage before the upgrade, the middle represents usage during the upgrade, and the far right reflects the rollback, where memory usage returns to normal levels-
image

To Reproduce
Steps to reproduce the behavior:

  1. Upgrade to Cortex v1.18.0 from v1.17.1 using the Cortex Helm Chart with the values in the Additional Context section.

Expected behavior
Store-GW shouldn't be OOMKilling.

Environment:

  • Infrastructure: AKS(Kubernetes)
  • Deployment tool: Cortex Helm Chart v2.3.0 or v2.4.0

Additional Context

Helm Chart Values Passed
        useExternalConfig: true
        image:
          repository: redact
          tag: v1.18.0
        externalConfigVersion: x
        ingress:
          enabled: true
          ingressClass:
            enabled: true
            name: nginx
          hosts:
            - host: cortex.redact
              paths:
                - /
          tls:
            - hosts:
              - cortex.redact
        serviceAccount:
          create: true
          automountServiceAccountToken: true
        store_gateway:
          replicas: 6
          persistentVolume:
            storageClass: premium
            size: 64Gi
          resources:
            resources:
              limits:
                memory: 24Gi
              requests:
                memory: 18Gi
          extraArgs:
            blocks-storage.bucket-store.index-cache.memcached.max-async-buffer-size: "10000000"
            blocks-storage.bucket-store.index-cache.memcached.max-get-multi-concurrency: "100"
            blocks-storage.bucket-store.index-cache.memcached.max-get-multi-batch-size: "100"
            blocks-storage.bucket-store.bucket-index.enabled: true
            blocks-storage.bucket-store.index-header-lazy-loading-enabled: true
            store-gateway.sharding-ring.zone-stable-shuffle-sharding: False
            store-gateway.sharding-ring.zone-awareness-enabled: False
          serviceMonitor:
            enabled: true
            additionalLabels:
              release: kube-prometheus-stack
            relabelings:
              - sourceLabels: [__meta_kubernetes_pod_name]
                targetLabel: instance
        compactor:
          persistentVolume:
            size: 256Gi
            storageClass: premium
          resources:
            limits:
              cpu: 4
              memory: 10Gi
            requests:
              cpu: 1.5
              memory: 5Gi
          serviceMonitor:
            enabled: true
            additionalLabels:
              release: kube-prometheus-stack
            relabelings:
              - sourceLabels: [__meta_kubernetes_pod_name]
                targetLabel: instance
          extraArgs:
            blocks-storage.bucket-store.bucket-index.enabled: true
        nginx:
          replicas: 3
          image:
            repository: redact
            tag: 1.27.2-alpine-slim
          serviceMonitor:
            enabled: true
            additionalLabels:
              release: kube-prometheus-stack
            relabelings:
              - sourceLabels: [__meta_kubernetes_pod_name]
                targetLabel: instance
          resources:
            limits:
              cpu: 500m
              memory: 500Mi
            requests:
              cpu: 500m
              memory: 500Mi
          config:
            verboseLogging: false
        query_frontend:
          replicas: 3
          resources:
            limits:
              cpu: 1
              memory: 5Gi
            requests:
              cpu: 200m
              memory: 4Gi
          serviceMonitor:
            enabled: true
            additionalLabels:
              release: kube-prometheus-stack
            relabelings:
              - sourceLabels: [__meta_kubernetes_pod_name]
                targetLabel: instance
          extraArgs:
            querier.query-ingesters-within: 8h
        querier:
          replicas: 3
          resources:
            limits:
              cpu: 8
              memory: 26Gi
            requests:
              cpu: 1
              memory: 20Gi
          extraArgs:
            querier.query-ingesters-within: 8h
            querier.max-fetched-data-bytes-per-query: "2147483648"
            querier.max-fetched-chunks-per-query: "1000000"
            querier.max-fetched-series-per-query: "200000"
            querier.max-samples: "50000000"
            blocks-storage.bucket-store.bucket-index.enabled: true
          serviceMonitor:
            enabled: true
            additionalLabels:
              release: kube-prometheus-stack
            relabelings:
              - sourceLabels: [__meta_kubernetes_pod_name]
                targetLabel: instance
        ingester:
          statefulSet:
            enabled: true
          replicas: 18
          persistentVolume:
            enabled: true
            size: 64Gi
            storageClass: premium
          resources:
            limits:
              cpu: 8
              memory: 45Gi
            requests:
              cpu: 8
              memory: 40Gi
          extraArgs:
            ingester.max-metadata-per-user: "50000"
            ingester.max-series-per-metric: "200000"
            ingester.instance-limits.max-series: "0"
            ingester.ignore-series-limit-for-metric-names: "redact"
          serviceMonitor:
            enabled: true
            additionalLabels:
              release: kube-prometheus-stack
            relabelings:
              - sourceLabels: [__meta_kubernetes_pod_name]
                targetLabel: instance
        ruler:
          validation:
            enabled: false
          replicas: 3
          resources:
            limits:
              cpu: 2
              memory: 6Gi
            requests:
              cpu: 500m
              memory: 3Gi
          sidecar:
            image:
              repository: redact
              tag: 1.28.0
            resources:
              limits:
                cpu: 1
                memory: 200Mi
              requests:
                cpu: 50m
                memory: 100Mi
            enabled: true
            searchNamespace: cortex-rules
            folder: /tmp/rules
          serviceMonitor:
            enabled: true
            additionalLabels:
              release: kube-prometheus-stack
            relabelings:
              - sourceLabels: [__meta_kubernetes_pod_name]
                targetLabel: instance
          extraArgs:
            blocks-storage.bucket-store.bucket-index.enabled: true
            querier.max-fetched-chunks-per-query: "2000000"
        alertmanager:
          enabled: true
          replicas: 3
          podAnnotations:
            configmap.reloader.stakater.com/reload: "redact"
          statefulSet:
            enabled: true
          persistentVolume:
            size: 8Gi
            storageClass: premium
          sidecar:
            image:
              repository: redact
              tag: 1.28.0
            containerSecurityContext:
              enabled: true
              runAsUser: 0
            resources:
              limits:
                cpu: 100m
                memory: 200Mi
              requests:
                cpu: 50m
                memory: 100Mi
            enabled: true
            searchNamespace: cortex-alertmanager
          serviceMonitor:
            enabled: true
            additionalLabels:
              release: kube-prometheus-stack
            relabelings:
              - sourceLabels: [__meta_kubernetes_pod_name]
                targetLabel: instance
        distributor:
          resources:
            limits:
              cpu: 4
              memory: 10Gi
            requests:
              cpu: 2
              memory: 10Gi
          extraArgs:
            distributor.ingestion-rate-limit: "120000"
            validation.max-label-names-per-series: 40
            distributor.ha-tracker.enable-for-all-users: true
            distributor.ha-tracker.enable: true
            distributor.ha-tracker.failover-timeout: 30s
            distributor.ha-tracker.cluster: "prometheus"
            distributor.ha-tracker.replica: "prometheus_replica"
            distributor.ha-tracker.consul.hostname: consul.cortex:8500
            distributor.instance-limits.max-ingestion-rate: "120000"
          serviceMonitor:
            enabled: true
            additionalLabels:
              release: kube-prometheus-stack
            relabelings:
              - sourceLabels: [__meta_kubernetes_pod_name]
                targetLabel: instance
          autoscaling:
            minReplicas: 15
            maxReplicas: 30
        memcached-frontend:
          enabled: true
          image:
            registry: redact
            repository: redact/memcached-bitnami
            tag: redact
          commonLabels:
            release: kube-prometheus-stack
          podManagementPolicy: OrderedReady
          metrics:
            enabled: true
            image:
              registry: redact
              repository: redact/memcached-exporter-bitnami
              tag: redact
            serviceMonitor:
              enabled: true
              relabelings:
                - sourceLabels: [__meta_kubernetes_pod_name]
                  targetLabel: instance
          resources:
            requests:
              memory: 1Gi
              cpu: 1
            limits:
              memory: 1.5Gi
              cpu: 1
          args:
            - /run.sh
            - -I 32m
          serviceAccount:
            create: true
        memcached-blocks-index:
          enabled: true
          image:
            registry: redact
            repository: redact/memcached-bitnami
            tag: redact
          commonLabels:
            release: kube-prometheus-stack
          podManagementPolicy: OrderedReady
          metrics:
            enabled: true
            image:
              registry: redact
              repository: redact/memcached-exporter-bitnami
              tag: redact
            serviceMonitor:
              enabled: true
              relabelings:
                - sourceLabels: [__meta_kubernetes_pod_name]
                  targetLabel: instance
          resources:
            requests:
              memory: 1Gi
              cpu: 1
            limits:
              memory: 1.5Gi
              cpu: 1.5
          args:
            - /run.sh
            - -I 32m
          serviceAccount:
            create: true
        memcached-blocks:
          enabled: true
          image:
            registry: redact
            repository: redact/memcached-bitnami
            tag: redact
          commonLabels:
            release: kube-prometheus-stack
          podManagementPolicy: OrderedReady
          metrics:
            enabled: true
            image:
              registry: redact
              repository: redact/memcached-exporter-bitnami
              tag: redact
            serviceMonitor:
              enabled: true
              relabelings:
                - sourceLabels: [__meta_kubernetes_pod_name]
                  targetLabel: instance
          resources:
            requests:
              memory: 2Gi
              cpu: 1
            limits:
              memory: 3Gi
              cpu: 1
          args:
            - /run.sh
            - -I 32m
          serviceAccount:
            create: true
        memcached-blocks-metadata:
          enabled: true
          image:
            registry: redact
            repository: redact/memcached-bitnami
            tag: redact
          commonLabels:
            release: kube-prometheus-stack
          podManagementPolicy: OrderedReady
          metrics:
            enabled: true
            image:
              registry: redact
              repository: redact/memcached-exporter-bitnami
              tag: redact
            serviceMonitor:
              enabled: true
              relabelings:
                - sourceLabels: [__meta_kubernetes_pod_name]
                  targetLabel: instance
          resources:
            requests:
              memory: 1Gi
              cpu: 1
            limits:
              memory: 1.5Gi
              cpu: 1
          args:
            - /run.sh
            - -I 32m
          serviceAccount:
            create: true
        runtimeconfigmap:
          create: true
          annotations: {}
          runtime_config: {}
Quick PPROF of Store GW
curl -s http://localhost:8080/debug/pprof/heap > heap.out

go tool pprof heap.out

top

Showing nodes accounting for 622.47MB, 95.80% of 649.78MB total
Dropped 183 nodes (cum <= 3.25MB)
Showing top 10 nodes out of 49
      flat  flat%   sum%        cum   cum%
  365.95MB 56.32% 56.32%   365.95MB 56.32%  github.com/thanos-io/thanos/pkg/block/indexheader.(*BinaryReader).init.func3
  127.94MB 19.69% 76.01%   528.48MB 81.33%  github.com/thanos-io/thanos/pkg/block/indexheader.(*BinaryReader).init
   76.30MB 11.74% 87.75%    76.30MB 11.74%  github.com/thanos-io/thanos/pkg/cacheutil.NewAsyncOperationProcessor
   34.59MB  5.32% 93.07%    34.59MB  5.32%  github.com/prometheus/prometheus/tsdb/index.NewSymbols

pprof002

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions