Skip to content

Conversation

@akutz
Copy link
Owner

@akutz akutz commented Nov 28, 2022

No description provided.

bryanv and others added 30 commits January 27, 2021 16:24
By generating v1 instead of v1beta1, kubectl explain now works.

preserveUnknownFields is needed for explain to work when upgrading
from CRD v1beta1 to v1 but controller-gen has a bug [1] that causes
it not be included in the generated CRD yaml. Use kustomize to add
it the CRDs.

Keep the webhooks at their prior version until we decide to or must
update them. The new controller-gen defaults to newer version.

[1] - kubernetes-sigs/controller-tools#476
k8s v1.15 moved to go modules which makes it hard for us to keep
importing "k8s.io/kubernetes/pkg/api/v1/endpoints" directly, so
copy in the function we need to sort the Service EP the same as
they are done in k8s.
VirtualMachineImages created before a certain version do not have OwnerReference
to the ContentLibraryProvider resource. VM operator uses this OwnerReference to
find out which vSphere content library it should Clone the VM from. This caused
TKG upgrade failures when we wanted to Clone new VMs from old images.

After upgrade, we should update these images with the OwnerReference.
VM Operator does not support several configuration updates. The reconcile code
is designed such that it only consumes the certain spec updates when the VM is
powered on. Even though we do not reconcile these config updates, a developer is
allowed to update a VM spec. The spec updates takes affect when the VM is power
cycled. This is a poor UX because we do not given any feedback to users that a
config spec is not reconciled, nor do we prevent them from modifying the config.

This change modifies the virtualmachine validating webhooks to deny any config
updates that VM operator does not reconcile. We also give a helpful message that
the config update is not allowed in the current power state.

Summary of config field changes:

*Immutable Fields*
- ResourcePolicyName
This is only consumed once when the VM is d eployed from the OVF. This should not
be updated.

*Not allowed when VM is powered on*
- Ports
- NetworkInterfaces
- VmMetadata
- Volumes - referencing a vSphere volumes
- AdvancedOptions.DefaultVolumeProvisioningOptions
All these options are not being reconciled when the VM is powered on, so we
deny the Update requests using the webhook.

We are not doing any validations for the Update operations (even basic checks
like validting that the names are DNS compliant). This change also modifies the
webhooks so that we perform these checks during Updates.
With the introduction of VM service, we enforce namespace scoped content
sources. This means a developer can only use VM images from the content
libraries that the VI admin has associated with the developer's namespace.

This change introduces the code to enforce the namespace scoped association.

We also do not reconcile a VirtualMachine if the ContentSources associated with
namespaces change. This means that if the reconciliation queue is in
exponentially backed off state, and the VI admin associates the content library
to a namespace, the VM will not be reconciled even if the bindings are
available.

This change adds code to watch the ContentSourceBindings and queue reconcile
requests for the VMs using any images from those content sources.
CLN 8372fed32cced4103944dc0914f41cce3b8b3e61 on the vm-operator-api repo
introduced validations for the VM power state and VM metadata transport
mechanisms. This change bumps up the API dependency in go.mod and includes the
updated CRDs.

- Also remove the webhook checks since they are no longer needed.

Testing Done:
- Manually deployed a VM with invalid spec.powerState and
spec.vmMetadata.Transport. Verified that the request was denied.

```
$ cat <<EOF | kubectl apply -f -
pipe heredoc> apiVersion: vmoperator.vmware.com/v1alpha1
kind: VirtualMachine
metadata:
  name: vmsvc-ubuntu-vm-2
  namespace: parunesh-ubuntu-ns
spec:
  imageName: photon-3-k8s-v1.18.10---vmware.1-tkg.1.729f636
  className: best-effort-xsmall
  powerState: invalid-powerState
  storageClass: wcpglobal-storage-profile
  vmMetadata:
    transport: invalid-transport
pipe heredoc> EOF
The VirtualMachine "vmsvc-ubuntu-vm-2" is invalid:
* spec.powerState: Unsupported value: "invalid-powerState": supported values: "poweredOff", "poweredOn"
* spec.vmMetadata.transport: Unsupported value: "invalid-transport": supported values: "ExtraConfig", "OvfEnv"
```

- Verified that the valid values work
```
$ cat << EOF | kubectl apply -f -
apiVersion: vmoperator.vmware.com/v1alpha1
kind: VirtualMachine
metadata:
  name: vmsvc-ubuntu-vm-2
  namespace: parunesh-ubuntu-ns
spec:
  imageName: photon-3-k8s-v1.18.10---vmware.1-tkg.1.729f636
  className: best-effort-xsmall
  powerState: poweredOn
  storageClass: wcpglobal-storage-profile
  vmMetadata:
    transport: OvfEnv
EOF
virtualmachine.vmoperator.vmware.com/vmsvc-ubuntu-vm-2 configured
```
VM operator watches the provider ConfigMap for any changes to the content
sources from wcpsvc. This is how we create various custom resources including
ContentSourceBindings in the workload namespaces. Since we do not watch
namespaces, we do not create bindings in any new namespace that is created on
wcpsvc.
Our predicate functions for the provider ConfigMap are also strict which
means we only trigger the reconcile when there is a change in the configured
content source. This means that if somehow the system deviates from the desired
state, there is no reconciliation until the next change in content source.

This change fixes these issues. Specifically, this change:
- Adds a watch for namespaces that queues a reconcile when a namespace is
  created.
- Relaxes the predicates so we always reconcile the ConfigMap. The reconcile
  loop ensures that the current state is the same as the desired state.
- Protect the CotentSourceBinding creation in VMService FSS, since these are
  only enforced if the FSS is enabled.
- Fix the deploy-wcp script to not remove the ContentSource key from the
  provider ConfigMap. We have abandoned that approach now.
- Add a new integration test. Refactor the existing test under the FSS.

Testing Done:
…pecial property

Due to the raciness between GOSC and cloud-init customization, we need a workflow that ensures VMs are created
successfully with customization using the VM Operator workflow.
This enforces VI admin to prepare the OVA using the instructions provided here 
The preparation steps  also involves adding a new vApp option "vmware-system.vmsvc.compatibility" that denotes whether the image is being  prepared or not for VMService

This MR adds capability to:
- Add a new field for ImageSupportedState in VirtualMachineImageStatus that denotes if the image is compatible or not. True if compatible, false otherwise
- Updates the VirtualMachineImageStatus.ImageSupportedState if it is a TKG image or is made compatible by adding guestinfo.vmservice.defer-cloud-init->ready extraconfig key
- VirtualMachine_Validator is updated to error out if the VirtualMachineImageStatus.ImageCompatibility is set to false
- Adds conditions to VirtualMachineImageStatus to denote more information regarding the image incompatibility
- Reconfigures the extraconfig guestinfo.vmservice.defer-cloud-init->ready to guestinfo.vmservice.defer-cloud-init->enabled in the virtual machine creation workflow to ensure cloud init is deferred in the first boot for compatible images

Unit tests added
Validated on a real testbed

Non-Compatible image:

root@4220964932a3ed994e8e1b8acec25864 [ ~ ]# kubectl create -f noncompat.yaml
configmap/non-compat-configmap created
Error from server (GuestOS not supported for osType ubuntu64Guest on image focal-server-cloudimg-amd64 or VM image does not have guestinfo.vmservice.defer-cloud-init extraConfig key set to ready or is not a TKG Image): error when creating "noncompat.yaml": admission webhook "default.validating.virtualmachine.vmoperator.vmware.com" denied the request: GuestOS not supported for osType ubuntu64Guest on image focal-server-cloudimg-amd64 or VM image does not have guestinfo.vmservice.defer-cloud-init extraConfig key set to ready or is not a TKG Image

Compatible Image

Status:
  Conditions:
    Last Transition Time:  2021-02-11T04:48:18Z
    Status:                True
    Type:                  VirtualMachineImageOSTypeSupported
    Last Transition Time:  2021-02-11T04:48:18Z
    Status:                True
    Type:                  VirtualMachineImageV1Alpha1Compatible
  Image Supported:         true
  Internal Id:             ubuntu-20.04-vmservice-v1alpha1.20210210
  Uuid:                    9748cc2b-c5c4-471f-ba25-5a84c5f8a190
Events:                    <none>

API changes: srajashekar-vmw/vm-operator-api@d1dc57b
Move the clone up and remove usages of a stale cloned vm from earlier
tests to fix the mismatched keys.

Testing Done:
make test-integration passes consistently
This includes changes to introduce a new FSS feature flag - WCP_VMService_v1alpha2. This feature is used for enhancing vm-operator support for customization. This involves supporting opt-in VM Service customizations with multiple selections allowing fine-grained control of type of customization specific to the OVF virtualmachineimage requirements.
Proposal for WCP_VMService_v1alpha2: 

Testing done:
Builds:
vcenter-all sandbox: 
esxall sandox: 
vm-operator sandbox: 
cayman-stateless-photon sandbox: 

dev-integ-vds: 
Verified thes WCP_VMService_v1alpha2 is enabled on SV VM and value in reflected in vmop yaml.
   - root@sc2-10-186-6-175 [ ~ ]# cat /etc/vmware/wcp/features.yaml | grep WCP_VMService_v1alpha2
     WCP_VMService_v1alpha2: true
   - Even vmop yaml shows that feature is enabled: FSS_WCP_VMSERVICE_V1ALPHA2 = true
This change adds ovfEnv properties in VirtualMachineImage spec.
A developer can do a describe on the virtualmachineimage and
see the userConfigurable ovf key, type and default value.

Test done:
  - make test
…t in the cluster

The GCE2E run against a kind cluster does not set gOSIdsToFamily to contain
valid guest OS descriptors. Because of this, the validation webhook was always
failing to create CAPI VMs.
This change bypasses isSupportedGuestOS validation if gOSIdsToFamily is not set

GC E2E run:
- Upstream govmomi has the cluster module changes now, so we do not need to
maintain a duplicate copy in our repo.

- Remove utils from VirtualMachineService which housed single func being used in a
single file.

- Fix the copyright header formatting of files in this repo.

- I am skipping the gobuild and load-k8s-master on purpose since we do not have
clarity as to whether they would live in this repo or not.
Compare the current state of the VM from VC with the desired state
specified by k8s, and reconfigure the VM to move towards the desired
state.

Start to split apart the big session.go file into various files.

Still needs to be smarter around Network reconfigure by comparing the
NIC backings.
This means less client to have to deal with when reworking the LB
provider tests. The NCP VirtualNetworkInterfaces CRs are pretty
small, so caching them shouldn't cause an issue, since we already
do for NetOP NetworkInterfaces.

In a follow on commit we should really be watching the NCP and NetOP
network interfaces owned by VMs.
ThunderPciDevices FSS is used to enable vGPU support for VM Service workflow.
This change plumbs ThunderPciDevices FSS into vm-operator
* for local kind env, ThunderPciDevices is set to false
* for wcp env, ThunderPciDevices is set based on the FSS value in wcp config.

test done:
* make test-integration
* vm-operator sandbox build:
It is failing too much in the build pipelines. All the integration
readiness checks should be rewritten as unit tests, as controller
unit tests weren't available when the readiness stuff was added.
Update the code flow so that we check the network interfaces
just once as a part of the VM power on flow. This means we
stop creating NetworkProviders all over the place, and we
collect our network interface device and customization just
once up front.
Consolidate VM properties calls that get the VM config, runtime,
and/or summary into one call. Previously we would make repeated
calls to fetch a single property - this is not ideal because of having
to go to VC introduces more errors to handle.

Remove some long commented out VM annotations that are now a
part of the VM Status.

Add more unit tests to cover the update virtual machine
config spec

Add missing tests for delete VM

Remove ListVirtualMachines since its only use was in testing
The CNI controller sometimes includes raw SOAP errors from VC in the
Error field, which can cause the VM object churn.
As part of features in v1alpha2, we are adding support for specifying
templatized values for guest customization key/value pairs. As a first
step, we are introducing this support for specifying networking
configuration, but support for more configurations can be added
relatively easily in the future. For e.g.: In order to
to specify the networkprovider-assigned IP address for the
first network interface, (i.e., IP address corresponding to
spec.networkInterface[0]), we can specify {{ (index .Nic 0).IP }} in the
guest customization value. VM Operator will pick this up and populate
it with the actual IP before invoking update.for guest customization key/value pairs. As a first
step, we are introducing this support for specifying networking
configuration, but support for more configurations can be added
relatively easily in the future. For e.g.: In order to
to specify the networkprovider-assigned IP address for the
first network interface, (i.e., IP address corresponding to
spec.networkInterface[0]), we can specify {{ (index .Nic 0).IP }} in the
guest customization value. VM Operator will pick this up and populate
it with the actual IP before invoking update.

Testing Done:
  * make test
configMap:
---
apiVersion: v1
kind: ConfigMap
metadata:
    name: test-vm
    namespace: my-podvm-ns
data:
  ip: "{{ (index .Nic 0).IP }}"
  subnetMask: "{{ (index .Nic 0).SubnetMask }}"
  gateway: "{{ (index .Nic 0).Gateway }}"
  dnsNameserver: "{{ (index .DnsNameServers 0) }}"

  * vds:
root@423894ef1a38d0422ffbdabb5767092b [ ~ ]# kubectl logs vmware-system-vmop-controller-manager-df46db746-vj9vn -n vmware-system-vmop manager | grep "template"
I0329 23:14:04.362119       1 session_vm_update.go:553] vsphere "msg"="template data"  "dnsNameserver"="10.20.145.1"
I0329 23:14:04.362508       1 session_vm_update.go:553] vsphere "msg"="template data"  "gateway"="192.168.1.1"
I0329 23:14:04.362576       1 session_vm_update.go:553] vsphere "msg"="template data"  "ip"="192.168.128.14"
I0329 23:14:04.362708       1 session_vm_update.go:553] vsphere "msg"="template data"  "subnetMask"="255.255.0.0"

  * nsxt:
root@4236ae36182bdee17dca987895037091 [ ~ ]# kubectl logs vmware-system-vmop-controller-manager-74dfcffd98-t5g78 -n vmware-system-vmop manager | grep "template"
I0330 00:22:32.956676       1 session_vm_update.go:555] vsphere "msg"="template data"  "ip"="172.26.1.226"
I0330 00:22:32.957650       1 session_vm_update.go:555] vsphere "msg"="template data"  "subnetMask"="255.255.255.240"
I0330 00:22:32.958328       1 session_vm_update.go:555] vsphere "msg"="template data"  "dnsNameserver"="10.20.145.1"
I0330 00:22:32.958776       1 session_vm_update.go:555] vsphere "msg"="template data"  "gateway"="172.26.1.225"
VirtualMachines with Persistant volume claims depend on the BiosUUID to
be populated on the VM status before the volume controller takes over
and creates cnsnodeattachments objects. We need to wait for the volume
status to update to "Attached" (done by the volume_controller) before
poweringOn the VM.

Testing Done:

Manual Tests on a wcp testbed.
VM powers ON only after volume attachement

status:
  biosUUID: 4217456e-3062-f67c-d20d-51920da1bb0a
  changeBlockTracking: false
  conditions:
  - lastTransitionTime: "2021-04-05T22:53:04Z"
    status: "True"
    type: VirtualMachinePrereqReady
  host: 10.161.111.180
  instanceUUID: 5017e39d-a980-e1e4-5d40-7b98a6f9a862
  phase: Created
  powerState: poweredOn
  uniqueID: vm-159
  vmIp: 192.168.128.4
  volumes:
  - attached: true
    diskUUID: 6000C29d-805a-c338-fda3-71f3a492e682
    error: ""
    name: my-pvc

integration tests: make test-integration
….16 of boringcrypto.

Testing
========
- VM Operator SB Build - 
- CSP Build - 

- [x] dev-integ-vds - 
- [x] dev-integ-nsxt -
All the controller tests have been converted over so these all use the
same new env test suite.

Remove controllers from the old-style tests since including them there
meant we were running the unit tests again.
akutz and others added 4 commits November 28, 2022 13:17
This patch updates the Image Registry APIs that are vendored when
pushing the project upstream.
@github-actions
Copy link

Code Coverage

Package Line Rate Health
github.com/vmware-tanzu/vm-operator/controllers 0%
github.com/vmware-tanzu/vm-operator/controllers/contentlibrary/clustercontentlibraryitem 77%
github.com/vmware-tanzu/vm-operator/controllers/contentlibrary/contentsource 83%
github.com/vmware-tanzu/vm-operator/controllers/contentlibrary/utils 95%
github.com/vmware-tanzu/vm-operator/controllers/infracluster 75%
github.com/vmware-tanzu/vm-operator/controllers/infraprovider 75%
github.com/vmware-tanzu/vm-operator/controllers/providerconfigmap 75%
github.com/vmware-tanzu/vm-operator/controllers/util/encoding 73%
github.com/vmware-tanzu/vm-operator/controllers/util/remote 41%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachine 56%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineclass 31%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinepublishrequest 79%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineservice 83%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineservice/providers 96%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineservice/providers/simplelb 66%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineservice/utils 82%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinesetresourcepolicy 80%
github.com/vmware-tanzu/vm-operator/controllers/volume 86%
github.com/vmware-tanzu/vm-operator/controllers/webconsolerequest 71%
github.com/vmware-tanzu/vm-operator/pkg/builder 70%
github.com/vmware-tanzu/vm-operator/pkg/conditions 90%
github.com/vmware-tanzu/vm-operator/pkg/context 0%
github.com/vmware-tanzu/vm-operator/pkg/context/fake 100%
github.com/vmware-tanzu/vm-operator/pkg/lib 79%
github.com/vmware-tanzu/vm-operator/pkg/manager 81%
github.com/vmware-tanzu/vm-operator/pkg/metrics 85%
github.com/vmware-tanzu/vm-operator/pkg/patch 78%
github.com/vmware-tanzu/vm-operator/pkg/prober 92%
github.com/vmware-tanzu/vm-operator/pkg/prober/context 100%
github.com/vmware-tanzu/vm-operator/pkg/prober/fake 85%
github.com/vmware-tanzu/vm-operator/pkg/prober/fake/probe 83%
github.com/vmware-tanzu/vm-operator/pkg/prober/fake/worker 88%
github.com/vmware-tanzu/vm-operator/pkg/prober/probe 83%
github.com/vmware-tanzu/vm-operator/pkg/prober/worker 86%
github.com/vmware-tanzu/vm-operator/pkg/record 89%
github.com/vmware-tanzu/vm-operator/pkg/topology 85%
github.com/vmware-tanzu/vm-operator/pkg/util 88%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/fake 79%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere 74%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/client 73%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/clustermodules 85%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/config 83%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/contentlibrary 73%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/credentials 100%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/instancestorage 92%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/internal 0%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/network 85%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/placement 82%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/resources 41%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/session 84%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/storage 77%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/test 98%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/vcenter 84%
github.com/vmware-tanzu/vm-operator/pkg/vmprovider/providers/vsphere/virtualmachine 84%
github.com/vmware-tanzu/vm-operator/pkg/webconsolevalidation 53%
github.com/vmware-tanzu/vm-operator/webhooks 0%
github.com/vmware-tanzu/vm-operator/webhooks/common 100%
github.com/vmware-tanzu/vm-operator/webhooks/persistentvolumeclaim 0%
github.com/vmware-tanzu/vm-operator/webhooks/persistentvolumeclaim/validation 95%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachine 0%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachine/mutation 83%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachine/validation 93%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineclass 0%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineclass/mutation 62%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineclass/validation 89%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinepublishrequest 0%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinepublishrequest/validation 93%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineservice 0%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineservice/mutation 62%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineservice/validation 91%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinesetresourcepolicy 0%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinesetresourcepolicy/mutation 62%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinesetresourcepolicy/validation 89%
github.com/vmware-tanzu/vm-operator/webhooks/webconsolerequest 0%
github.com/vmware-tanzu/vm-operator/webhooks/webconsolerequest/validation 92%
Summary 80% (5335 / 6705)

Minimum allowed line rate is 60%

@akutz akutz force-pushed the main branch 8 times, most recently from c112e57 to 17d9b78 Compare December 4, 2022 20:54
@akutz akutz force-pushed the main branch 6 times, most recently from fb3ece3 to 647a2a7 Compare January 30, 2023 18:48
@akutz akutz force-pushed the main branch 4 times, most recently from b8c2258 to fa1e541 Compare March 15, 2023 07:36
@akutz akutz force-pushed the main branch 5 times, most recently from 648c46b to 3e0f001 Compare July 29, 2025 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.