Skip to content

external-resizer crashing in azuredisk-csi-driver and azurefile-csi-driver #130

@emiliodangelo

Description

@emiliodangelo

From kubernetes-sigs/azurefile-csi-driver#495

What happened:
After installing azurefile-csi-driver and azuredisk-csi-driver in a Kubernetes cluster, csi-resizer container, inside csi-azurefile-controller and csi-azuredisk-controller pods, is crashing every 1 or 2 minutes with the following message:

csi-resizer log:

...
I1211 12:27:26.339777       1 leaderelection.go:283] successfully renewed lease kube-system/external-resizer-file-csi-azure-com
I1211 12:27:31.349381       1 leaderelection.go:283] successfully renewed lease kube-system/external-resizer-file-csi-azure-com
runtime: mlock of signal stack failed: 12
runtime: increase the mlock limit (ulimit -l) or
runtime: update your kernel to 5.3.15+, 5.4.2+, or 5.5+
fatal error: mlock failed

runtime stack:
runtime.throw(0x15d27f3, 0xc)
    /usr/lib/go-1.14/src/runtime/panic.go:1112 +0x72
runtime.mlockGsignal(0xc000682a80)
    /usr/lib/go-1.14/src/runtime/os_linux_x86.go:72 +0x107
runtime.mpreinit(0xc000079180)
    /usr/lib/go-1.14/src/runtime/os_linux.go:341 +0x78
runtime.mcommoninit(0xc000079180)
    /usr/lib/go-1.14/src/runtime/proc.go:630 +0x108
runtime.allocm(0xc00004f800, 0x1672e98, 0x14f676dd7e26c)
    /usr/lib/go-1.14/src/runtime/proc.go:1390 +0x14e
runtime.newm(0x1672e98, 0xc00004f800)
    /usr/lib/go-1.14/src/runtime/proc.go:1704 +0x39
runtime.startm(0x0, 0xc000103201)
    /usr/lib/go-1.14/src/runtime/proc.go:1869 +0x12a
runtime.wakep(...)
    /usr/lib/go-1.14/src/runtime/proc.go:1953
runtime.resetspinning()
    /usr/lib/go-1.14/src/runtime/proc.go:2415 +0x93
runtime.schedule()
    /usr/lib/go-1.14/src/runtime/proc.go:2527 +0x2de
runtime.park_m(0xc000103200)
    /usr/lib/go-1.14/src/runtime/proc.go:2690 +0x9d
runtime.mcall(0x0)
    /usr/lib/go-1.14/src/runtime/asm_amd64.s:318 +0x5b

goroutine 1 [select, 2 minutes]:
k8s.io/client-go/tools/leaderelection.(*LeaderElector).renew.func1.1(0x13763e0, 0x0, 0xc0000eee40)
...

What you expected to happen:
The container should not fail so frequently.

How to reproduce it:
The failure started right after installing v0.7.0 of azurefile-csi-driver. I upgraded to v0.9.0 (for both, azurefile and azuredisk) with the same results. The Kubernetes cluster is composed of 3 master nodes and 3 workers running on Azure VMs (not AKS).

Anything else we need to know?:
Found a couple issues in golang/go repository that seems to be related:

Possibly upgrading golang version from 1.14 to 1.15 will solve the problem.

Environment:

  • CSI Driver version: v0.7.0 and v0.9.0
  • Kubernetes version (use kubectl version): v1.19.14
  • OS (e.g. from /etc/os-release): Ubuntu v20.04.1 LTS
  • Kernel (e.g. uname -a): 5.4.0-1032-azure Removed unused variable #33-Ubuntu SMP Fri Nov 13 14:23:34 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: Helm v3.4.2
  • Others:
    • Master node size: Standard D2ds_v4 (2 vcpus, 8 GiB memory)
    • Worker node size: Standard D16ds_v4 (16 vcpus, 64 GiB memory)

Complete log file: csi-resizer.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions