Skip to content

Inconsistencies between different installation methods #1061

Open
@eero-t

Description

@eero-t

I noticed some confusing inconsistencies when checking things under deployments folder and different installation methods...

GPU plugin is installed on different nodes depending on used method:

$ git grep feature.node.kubernetes.io | grep gpu
gpu_plugin/overlays/nfd_labeled_nodes/add-nodeselector-intel-gpu.yaml:        feature.node.kubernetes.io/pci-0300_8086.present: "true"
nfd/overlays/node-feature-rules/node-feature-rules-openshift.yaml:        "intel.feature.node.kubernetes.io/gpu": "true"
nfd/overlays/node-feature-rules/node-feature-rules.yaml:        "intel.feature.node.kubernetes.io/gpu": "true"
operator/samples/deviceplugin_v1_gpudeviceplugin.yaml:    intel.feature.node.kubernetes.io/gpu: "true"

IMHO at least the last two first options (relying on NFD), should be the same, not differ:

  • With operator, GPU plugins are deployed to all nodes with Intel GPUs
  • Using gpu_plugin/overlays/nfd_labeled_nodes, they go only to nodes with Intel GPUs supporting display output
  • Following GPU plugin README, plugin is deployed on all nodes regardless of whether they have GPU

Whereas inconsistencies in namespace usage between installation methods could mean that user ends accidentally with multiple instances of the same thing running in different namespaces:

$ cd deployments; git grep namespace: | grep -v -e { -e "'" -e % -e _test
dsa_plugin/overlays/dsa_initcontainer/dsa-config.yaml:  namespace: inteldeviceplugins-system
dsa_plugin/overlays/namespace_kube-system/add-namespace-kube-system.yaml:  namespace: kube-system
fpga_admissionwebhook/base/manager_webhook_patch.yaml:  namespace: system
fpga_admissionwebhook/certmanager/certificate.yaml:  namespace: system
fpga_admissionwebhook/certmanager/certificate.yaml:  namespace: system
fpga_admissionwebhook/default/kustomization.yaml:namespace: intelfpgawebhook-system
fpga_admissionwebhook/manager/manager.yaml:  namespace: system
fpga_admissionwebhook/rbac/role_binding.yaml:  namespace: system
fpga_admissionwebhook/webhook/kustomizeconfig.yaml:namespace:
fpga_admissionwebhook/webhook/manifests.yaml:      namespace: system
fpga_admissionwebhook/webhook/service.yaml:  namespace: system
fpga_plugin/base/intel-fpga-plugin-daemonset.yaml:  namespace: system
fpga_plugin/base/role_binding.yaml:  namespace: system
fpga_plugin/overlays/af/kustomization.yaml:namespace: intelfpgaplugin-system
fpga_plugin/overlays/region/kustomization.yaml:namespace: intelfpgaplugin-system
fpga_plugin/overlays/region/mode-region.yaml:  namespace: system
gpu_plugin/overlays/fractional_resources/resource-cluster-role-binding.yaml:  namespace: default
gpu_plugin/overlays/namespace_kube-system/add-namespace-kube-system.yaml:  namespace: kube-system
iaa_plugin/overlays/iaa_initcontainer/iaa-config.yaml:  namespace: inteldeviceplugins-system
nfd/overlays/node-feature-discovery/node-feature-discovery-openshift.yaml:  namespace: openshift-nfd
operator/certmanager/certificate.yaml:  namespace: system
operator/certmanager/certificate.yaml:  namespace: system
operator/crd/bases/deviceplugin.intel.com_dlbdeviceplugins.yaml:                  namespace:
operator/crd/bases/deviceplugin.intel.com_dsadeviceplugins.yaml:                  namespace:
operator/crd/bases/deviceplugin.intel.com_fpgadeviceplugins.yaml:                  namespace:
operator/crd/bases/deviceplugin.intel.com_gpudeviceplugins.yaml:                  namespace:
operator/crd/bases/deviceplugin.intel.com_iaadeviceplugins.yaml:                  namespace:
operator/crd/bases/deviceplugin.intel.com_qatdeviceplugins.yaml:                  namespace:
operator/crd/bases/deviceplugin.intel.com_sgxdeviceplugins.yaml:                  namespace:
operator/crd/kustomizeconfig.yaml:namespace:
operator/crd/patches/webhook_in_fpgadeviceplugins.yaml:        namespace: system
operator/crd/patches/webhook_in_gpudeviceplugins.yaml:        namespace: system
operator/crd/patches/webhook_in_qatdeviceplugins.yaml:        namespace: system
operator/default/kustomization.yaml:namespace: inteldeviceplugins-system
operator/default/manager_auth_proxy_patch.yaml:  namespace: system
operator/default/manager_webhook_patch.yaml:  namespace: system
operator/device/dlb/dlb.yaml:  namespace: inteldeviceplugins-system
operator/device/dsa/dsa.yaml:  namespace: inteldeviceplugins-system
operator/device/fpga/fpga.yaml:  namespace: inteldeviceplugins-system
operator/device/gpu/gpu.yaml:  namespace: inteldeviceplugins-system
operator/device/qat/qat.yaml:  namespace: inteldeviceplugins-system
operator/device/sgx/sgx.yaml:  namespace: inteldeviceplugins-system
operator/manager/manager.yaml:  namespace: system
operator/manifests/bases/intel-device-plugins-operator.clusterserviceversion.yaml:  namespace: placeholder
operator/rbac/auth_proxy_role_binding.yaml:  namespace: system
operator/rbac/auth_proxy_service.yaml:  namespace: system
operator/rbac/leader_election_role_binding.yaml:  namespace: system
operator/rbac/role_binding.yaml:  namespace: system
operator/webhook/kustomizeconfig.yaml:namespace:
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/manifests.yaml:      namespace: system
operator/webhook/service.yaml:  namespace: system
sgx_admissionwebhook/base/manager_webhook_patch.yaml:  namespace: system
sgx_admissionwebhook/certmanager/certificate.yaml:  namespace: system
sgx_admissionwebhook/certmanager/certificate.yaml:  namespace: system
sgx_admissionwebhook/manager/manager.yaml:  namespace: system
sgx_admissionwebhook/overlays/default-with-certmanager/kustomization.yaml:namespace: intelsgxwebhook-system
sgx_admissionwebhook/webhook/kustomizeconfig.yaml:namespace:
sgx_admissionwebhook/webhook/manifests.yaml:      namespace: system
sgx_admissionwebhook/webhook/service.yaml:  namespace: system
sgx_plugin/overlays/epc-register/kustomization.yaml:namespace: kube-system
sgx_plugin/overlays/epc-register/service-account.yaml:  namespace: kube-system
sgx_plugin/overlays/epc-register/service-account.yaml:  namespace: kube-system
vpu_plugin/overlays/namespace_kube-system/add-namespace-kube-system.yaml:  namespace: kube-system

For example, when using operator, GPU plugin goes to inteldeviceplugins-system ns, when using GPU plugin README apply method, it goes to default ns, and when using gpu_plugin/overlays/namespace_kube-system overlay, it goes to kube-system ns. Some of the other plugins, use different namespaces, but that's not consistent either.

GPU plugin also uses different service account & roles depending on installation method (this is from repo root):

$ git grep gpu-manager
Makefile:       $(CONTROLLER_GEN) rbac:roleName=gpu-manager-role paths="./cmd/gpu_plugin/..." output:dir=deployments/operator/rbac
charts/operator/templates/operator.yaml:  name: inteldeviceplugins-gpu-manager-role
deployments/operator/rbac/gpu_manager_role.yaml:  name: gpu-manager-role
pkg/controllers/gpu/controller.go:      serviceAccountName = "gpu-manager-sa"
pkg/controllers/gpu/controller.go:                              Name:      "gpu-manager-sa",
pkg/controllers/gpu/controller.go:                              Name:      "gpu-manager-rolebinding",
pkg/controllers/gpu/controller.go:                                      Name:      "gpu-manager-sa",
pkg/controllers/gpu/controller.go:                              Name:     "inteldeviceplugins-gpu-manager-role",

$ git grep resource-reader
cmd/gpu_plugin/README.md:serviceaccount/resource-reader-sa created
cmd/gpu_plugin/README.md:clusterrole.rbac.authorization.k8s.io/resource-reader created
cmd/gpu_plugin/README.md:clusterrolebinding.rbac.authorization.k8s.io/resource-reader-rb created
deployments/gpu_plugin/overlays/fractional_resources/add-serviceaccount.yaml:      serviceAccountName: resource-reader-sa
deployments/gpu_plugin/overlays/fractional_resources/kustomization.yaml:  - resource-reader-sa.yaml
deployments/gpu_plugin/overlays/fractional_resources/resource-cluster-role-binding.yaml:  name: resource-reader-rb
deployments/gpu_plugin/overlays/fractional_resources/resource-cluster-role-binding.yaml:  name: resource-reader-sa
deployments/gpu_plugin/overlays/fractional_resources/resource-cluster-role-binding.yaml:  name: resource-reader
deployments/gpu_plugin/overlays/fractional_resources/resource-cluster-role.yaml:  name: resource-reader
deployments/gpu_plugin/overlays/fractional_resources/resource-reader-sa.yaml:  name: resource-reader-sa

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions