Skip to content

Scale-Down Issue with Blobserve #3389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
meysholdt opened this issue Mar 9, 2021 · 1 comment · Fixed by #3390
Closed

Scale-Down Issue with Blobserve #3389

meysholdt opened this issue Mar 9, 2021 · 1 comment · Fixed by #3390

Comments

@meysholdt
Copy link
Member

Describe the bug

The GKE cluster did not scale down a node which was cordoned because GKE did not want to evict blobserve.

The GKE Autoscaler says:

{
  "insertId": "1683b516-4171-4b24-a605-87ccaedcdb5e@a1",
  "jsonPayload": {
    "noDecisionStatus": {
      "noScaleDown": {
        "nodesTotalCount": 1,
        "nodes": [
          {
            "reason": {
              "parameters": [
                "blobserve-5475b7fb89-6fpb8"
              ],
              "messageId": "no.scale.down.node.pod.has.local.storage"
            },
            "node": {
              "memRatio": 1,
              "mig": {
                "nodepool": "workspace-pool-0",
                "zone": "us-west1-a",
                "name": "gke-prod--gitpod-io--workspace-pool-0-d1b16495-grp"
              },
              "cpuRatio": 3,
              "name": "gke-prod--gitpod-io--workspace-pool-0-d1b16495-n6l1"
            }
          }
        ]
      },
      "measureTime": "1615283929"
    }
  },
  "resource": {
    "type": "k8s_cluster",
    "labels": {
      "cluster_name": "***",
      "location": "us-west1",
      "project_id": "***"
    }
  },
  "timestamp": "2021-03-09T09:58:49.321412544Z",
  "logName": "projects/gitpod-191109/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
  "receiveTimestamp": "2021-03-09T09:58:50.213337599Z"
}

GCP Docs say:

NoScaleDown example: You found a noScaleDown event that contains a per-node reason for your node. The message ID is "no.scale.down.node.pod.has.local.storage" and there is a single parameter: "test-single-pod". After consulting the list of error messages, you discover this means that the "Pod is blocking scale down because it requests local storage". You consult the Kubernetes Cluster Autoscaler FAQ and find out that the solution is to add a "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" annotation to the Pod. After applying the annotation, cluster autoscaler scales down the cluster correctly.

Steps to reproduce

Cordone a node with blobserve on it, observe that autoscaler does not remove it.

Expected behavior

The node should be scaled down.

Possible solution:

Add the annotaiton as suggested by the docs.

@fullmetalrooster
Copy link
Contributor

That is a common message in the logs. It is also image-builder which blocks a scale down.

pavan-tri pushed a commit to trilogy-group/gitpod that referenced this issue Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants