Skip to content

Scale-Down Issue with Blobserve #3389

Closed
@meysholdt

Description

@meysholdt

Describe the bug

The GKE cluster did not scale down a node which was cordoned because GKE did not want to evict blobserve.

The GKE Autoscaler says:

{
  "insertId": "1683b516-4171-4b24-a605-87ccaedcdb5e@a1",
  "jsonPayload": {
    "noDecisionStatus": {
      "noScaleDown": {
        "nodesTotalCount": 1,
        "nodes": [
          {
            "reason": {
              "parameters": [
                "blobserve-5475b7fb89-6fpb8"
              ],
              "messageId": "no.scale.down.node.pod.has.local.storage"
            },
            "node": {
              "memRatio": 1,
              "mig": {
                "nodepool": "workspace-pool-0",
                "zone": "us-west1-a",
                "name": "gke-prod--gitpod-io--workspace-pool-0-d1b16495-grp"
              },
              "cpuRatio": 3,
              "name": "gke-prod--gitpod-io--workspace-pool-0-d1b16495-n6l1"
            }
          }
        ]
      },
      "measureTime": "1615283929"
    }
  },
  "resource": {
    "type": "k8s_cluster",
    "labels": {
      "cluster_name": "***",
      "location": "us-west1",
      "project_id": "***"
    }
  },
  "timestamp": "2021-03-09T09:58:49.321412544Z",
  "logName": "projects/gitpod-191109/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
  "receiveTimestamp": "2021-03-09T09:58:50.213337599Z"
}

GCP Docs say:

NoScaleDown example: You found a noScaleDown event that contains a per-node reason for your node. The message ID is "no.scale.down.node.pod.has.local.storage" and there is a single parameter: "test-single-pod". After consulting the list of error messages, you discover this means that the "Pod is blocking scale down because it requests local storage". You consult the Kubernetes Cluster Autoscaler FAQ and find out that the solution is to add a "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" annotation to the Pod. After applying the annotation, cluster autoscaler scales down the cluster correctly.

Steps to reproduce

Cordone a node with blobserve on it, observe that autoscaler does not remove it.

Expected behavior

The node should be scaled down.

Possible solution:

Add the annotaiton as suggested by the docs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions