forked from vmware-tanzu/vm-operator
-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/oss 2 #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
akutz
wants to merge
593
commits into
main
Choose a base branch
from
feature/oss-2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
By generating v1 instead of v1beta1, kubectl explain now works. preserveUnknownFields is needed for explain to work when upgrading from CRD v1beta1 to v1 but controller-gen has a bug [1] that causes it not be included in the generated CRD yaml. Use kustomize to add it the CRDs. Keep the webhooks at their prior version until we decide to or must update them. The new controller-gen defaults to newer version. [1] - kubernetes-sigs/controller-tools#476
k8s v1.15 moved to go modules which makes it hard for us to keep importing "k8s.io/kubernetes/pkg/api/v1/endpoints" directly, so copy in the function we need to sort the Service EP the same as they are done in k8s.
VirtualMachineImages created before a certain version do not have OwnerReference to the ContentLibraryProvider resource. VM operator uses this OwnerReference to find out which vSphere content library it should Clone the VM from. This caused TKG upgrade failures when we wanted to Clone new VMs from old images. After upgrade, we should update these images with the OwnerReference.
VM Operator does not support several configuration updates. The reconcile code is designed such that it only consumes the certain spec updates when the VM is powered on. Even though we do not reconcile these config updates, a developer is allowed to update a VM spec. The spec updates takes affect when the VM is power cycled. This is a poor UX because we do not given any feedback to users that a config spec is not reconciled, nor do we prevent them from modifying the config. This change modifies the virtualmachine validating webhooks to deny any config updates that VM operator does not reconcile. We also give a helpful message that the config update is not allowed in the current power state. Summary of config field changes: *Immutable Fields* - ResourcePolicyName This is only consumed once when the VM is d eployed from the OVF. This should not be updated. *Not allowed when VM is powered on* - Ports - NetworkInterfaces - VmMetadata - Volumes - referencing a vSphere volumes - AdvancedOptions.DefaultVolumeProvisioningOptions All these options are not being reconciled when the VM is powered on, so we deny the Update requests using the webhook. We are not doing any validations for the Update operations (even basic checks like validting that the names are DNS compliant). This change also modifies the webhooks so that we perform these checks during Updates.
With the introduction of VM service, we enforce namespace scoped content sources. This means a developer can only use VM images from the content libraries that the VI admin has associated with the developer's namespace. This change introduces the code to enforce the namespace scoped association. We also do not reconcile a VirtualMachine if the ContentSources associated with namespaces change. This means that if the reconciliation queue is in exponentially backed off state, and the VI admin associates the content library to a namespace, the VM will not be reconciled even if the bindings are available. This change adds code to watch the ContentSourceBindings and queue reconcile requests for the VMs using any images from those content sources.
CLN 8372fed32cced4103944dc0914f41cce3b8b3e61 on the vm-operator-api repo
introduced validations for the VM power state and VM metadata transport
mechanisms. This change bumps up the API dependency in go.mod and includes the
updated CRDs.
- Also remove the webhook checks since they are no longer needed.
Testing Done:
- Manually deployed a VM with invalid spec.powerState and
spec.vmMetadata.Transport. Verified that the request was denied.
```
$ cat <<EOF | kubectl apply -f -
pipe heredoc> apiVersion: vmoperator.vmware.com/v1alpha1
kind: VirtualMachine
metadata:
name: vmsvc-ubuntu-vm-2
namespace: parunesh-ubuntu-ns
spec:
imageName: photon-3-k8s-v1.18.10---vmware.1-tkg.1.729f636
className: best-effort-xsmall
powerState: invalid-powerState
storageClass: wcpglobal-storage-profile
vmMetadata:
transport: invalid-transport
pipe heredoc> EOF
The VirtualMachine "vmsvc-ubuntu-vm-2" is invalid:
* spec.powerState: Unsupported value: "invalid-powerState": supported values: "poweredOff", "poweredOn"
* spec.vmMetadata.transport: Unsupported value: "invalid-transport": supported values: "ExtraConfig", "OvfEnv"
```
- Verified that the valid values work
```
$ cat << EOF | kubectl apply -f -
apiVersion: vmoperator.vmware.com/v1alpha1
kind: VirtualMachine
metadata:
name: vmsvc-ubuntu-vm-2
namespace: parunesh-ubuntu-ns
spec:
imageName: photon-3-k8s-v1.18.10---vmware.1-tkg.1.729f636
className: best-effort-xsmall
powerState: poweredOn
storageClass: wcpglobal-storage-profile
vmMetadata:
transport: OvfEnv
EOF
virtualmachine.vmoperator.vmware.com/vmsvc-ubuntu-vm-2 configured
```
VM operator watches the provider ConfigMap for any changes to the content sources from wcpsvc. This is how we create various custom resources including ContentSourceBindings in the workload namespaces. Since we do not watch namespaces, we do not create bindings in any new namespace that is created on wcpsvc. Our predicate functions for the provider ConfigMap are also strict which means we only trigger the reconcile when there is a change in the configured content source. This means that if somehow the system deviates from the desired state, there is no reconciliation until the next change in content source. This change fixes these issues. Specifically, this change: - Adds a watch for namespaces that queues a reconcile when a namespace is created. - Relaxes the predicates so we always reconcile the ConfigMap. The reconcile loop ensures that the current state is the same as the desired state. - Protect the CotentSourceBinding creation in VMService FSS, since these are only enforced if the FSS is enabled. - Fix the deploy-wcp script to not remove the ContentSource key from the provider ConfigMap. We have abandoned that approach now. - Add a new integration test. Refactor the existing test under the FSS. Testing Done:
…pecial property
Due to the raciness between GOSC and cloud-init customization, we need a workflow that ensures VMs are created
successfully with customization using the VM Operator workflow.
This enforces VI admin to prepare the OVA using the instructions provided here
The preparation steps also involves adding a new vApp option "vmware-system.vmsvc.compatibility" that denotes whether the image is being prepared or not for VMService
This MR adds capability to:
- Add a new field for ImageSupportedState in VirtualMachineImageStatus that denotes if the image is compatible or not. True if compatible, false otherwise
- Updates the VirtualMachineImageStatus.ImageSupportedState if it is a TKG image or is made compatible by adding guestinfo.vmservice.defer-cloud-init->ready extraconfig key
- VirtualMachine_Validator is updated to error out if the VirtualMachineImageStatus.ImageCompatibility is set to false
- Adds conditions to VirtualMachineImageStatus to denote more information regarding the image incompatibility
- Reconfigures the extraconfig guestinfo.vmservice.defer-cloud-init->ready to guestinfo.vmservice.defer-cloud-init->enabled in the virtual machine creation workflow to ensure cloud init is deferred in the first boot for compatible images
Unit tests added
Validated on a real testbed
Non-Compatible image:
root@4220964932a3ed994e8e1b8acec25864 [ ~ ]# kubectl create -f noncompat.yaml
configmap/non-compat-configmap created
Error from server (GuestOS not supported for osType ubuntu64Guest on image focal-server-cloudimg-amd64 or VM image does not have guestinfo.vmservice.defer-cloud-init extraConfig key set to ready or is not a TKG Image): error when creating "noncompat.yaml": admission webhook "default.validating.virtualmachine.vmoperator.vmware.com" denied the request: GuestOS not supported for osType ubuntu64Guest on image focal-server-cloudimg-amd64 or VM image does not have guestinfo.vmservice.defer-cloud-init extraConfig key set to ready or is not a TKG Image
Compatible Image
Status:
Conditions:
Last Transition Time: 2021-02-11T04:48:18Z
Status: True
Type: VirtualMachineImageOSTypeSupported
Last Transition Time: 2021-02-11T04:48:18Z
Status: True
Type: VirtualMachineImageV1Alpha1Compatible
Image Supported: true
Internal Id: ubuntu-20.04-vmservice-v1alpha1.20210210
Uuid: 9748cc2b-c5c4-471f-ba25-5a84c5f8a190
Events: <none>
API changes: srajashekar-vmw/vm-operator-api@d1dc57b
Move the clone up and remove usages of a stale cloned vm from earlier tests to fix the mismatched keys. Testing Done: make test-integration passes consistently
This includes changes to introduce a new FSS feature flag - WCP_VMService_v1alpha2. This feature is used for enhancing vm-operator support for customization. This involves supporting opt-in VM Service customizations with multiple selections allowing fine-grained control of type of customization specific to the OVF virtualmachineimage requirements.
Proposal for WCP_VMService_v1alpha2:
Testing done:
Builds:
vcenter-all sandbox:
esxall sandox:
vm-operator sandbox:
cayman-stateless-photon sandbox:
dev-integ-vds:
Verified thes WCP_VMService_v1alpha2 is enabled on SV VM and value in reflected in vmop yaml.
- root@sc2-10-186-6-175 [ ~ ]# cat /etc/vmware/wcp/features.yaml | grep WCP_VMService_v1alpha2
WCP_VMService_v1alpha2: true
- Even vmop yaml shows that feature is enabled: FSS_WCP_VMSERVICE_V1ALPHA2 = true
This change adds ovfEnv properties in VirtualMachineImage spec. A developer can do a describe on the virtualmachineimage and see the userConfigurable ovf key, type and default value. Test done: - make test
…t in the cluster The GCE2E run against a kind cluster does not set gOSIdsToFamily to contain valid guest OS descriptors. Because of this, the validation webhook was always failing to create CAPI VMs. This change bypasses isSupportedGuestOS validation if gOSIdsToFamily is not set GC E2E run:
- Upstream govmomi has the cluster module changes now, so we do not need to maintain a duplicate copy in our repo. - Remove utils from VirtualMachineService which housed single func being used in a single file. - Fix the copyright header formatting of files in this repo. - I am skipping the gobuild and load-k8s-master on purpose since we do not have clarity as to whether they would live in this repo or not.
Compare the current state of the VM from VC with the desired state specified by k8s, and reconfigure the VM to move towards the desired state. Start to split apart the big session.go file into various files. Still needs to be smarter around Network reconfigure by comparing the NIC backings.
This means less client to have to deal with when reworking the LB provider tests. The NCP VirtualNetworkInterfaces CRs are pretty small, so caching them shouldn't cause an issue, since we already do for NetOP NetworkInterfaces. In a follow on commit we should really be watching the NCP and NetOP network interfaces owned by VMs.
ThunderPciDevices FSS is used to enable vGPU support for VM Service workflow. This change plumbs ThunderPciDevices FSS into vm-operator * for local kind env, ThunderPciDevices is set to false * for wcp env, ThunderPciDevices is set based on the FSS value in wcp config. test done: * make test-integration * vm-operator sandbox build:
It is failing too much in the build pipelines. All the integration readiness checks should be rewritten as unit tests, as controller unit tests weren't available when the readiness stuff was added.
Update the code flow so that we check the network interfaces just once as a part of the VM power on flow. This means we stop creating NetworkProviders all over the place, and we collect our network interface device and customization just once up front.
Consolidate VM properties calls that get the VM config, runtime, and/or summary into one call. Previously we would make repeated calls to fetch a single property - this is not ideal because of having to go to VC introduces more errors to handle. Remove some long commented out VM annotations that are now a part of the VM Status. Add more unit tests to cover the update virtual machine config spec Add missing tests for delete VM Remove ListVirtualMachines since its only use was in testing
The CNI controller sometimes includes raw SOAP errors from VC in the Error field, which can cause the VM object churn.
As part of features in v1alpha2, we are adding support for specifying
templatized values for guest customization key/value pairs. As a first
step, we are introducing this support for specifying networking
configuration, but support for more configurations can be added
relatively easily in the future. For e.g.: In order to
to specify the networkprovider-assigned IP address for the
first network interface, (i.e., IP address corresponding to
spec.networkInterface[0]), we can specify {{ (index .Nic 0).IP }} in the
guest customization value. VM Operator will pick this up and populate
it with the actual IP before invoking update.for guest customization key/value pairs. As a first
step, we are introducing this support for specifying networking
configuration, but support for more configurations can be added
relatively easily in the future. For e.g.: In order to
to specify the networkprovider-assigned IP address for the
first network interface, (i.e., IP address corresponding to
spec.networkInterface[0]), we can specify {{ (index .Nic 0).IP }} in the
guest customization value. VM Operator will pick this up and populate
it with the actual IP before invoking update.
Testing Done:
* make test
configMap:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: test-vm
namespace: my-podvm-ns
data:
ip: "{{ (index .Nic 0).IP }}"
subnetMask: "{{ (index .Nic 0).SubnetMask }}"
gateway: "{{ (index .Nic 0).Gateway }}"
dnsNameserver: "{{ (index .DnsNameServers 0) }}"
* vds:
root@423894ef1a38d0422ffbdabb5767092b [ ~ ]# kubectl logs vmware-system-vmop-controller-manager-df46db746-vj9vn -n vmware-system-vmop manager | grep "template"
I0329 23:14:04.362119 1 session_vm_update.go:553] vsphere "msg"="template data" "dnsNameserver"="10.20.145.1"
I0329 23:14:04.362508 1 session_vm_update.go:553] vsphere "msg"="template data" "gateway"="192.168.1.1"
I0329 23:14:04.362576 1 session_vm_update.go:553] vsphere "msg"="template data" "ip"="192.168.128.14"
I0329 23:14:04.362708 1 session_vm_update.go:553] vsphere "msg"="template data" "subnetMask"="255.255.0.0"
* nsxt:
root@4236ae36182bdee17dca987895037091 [ ~ ]# kubectl logs vmware-system-vmop-controller-manager-74dfcffd98-t5g78 -n vmware-system-vmop manager | grep "template"
I0330 00:22:32.956676 1 session_vm_update.go:555] vsphere "msg"="template data" "ip"="172.26.1.226"
I0330 00:22:32.957650 1 session_vm_update.go:555] vsphere "msg"="template data" "subnetMask"="255.255.255.240"
I0330 00:22:32.958328 1 session_vm_update.go:555] vsphere "msg"="template data" "dnsNameserver"="10.20.145.1"
I0330 00:22:32.958776 1 session_vm_update.go:555] vsphere "msg"="template data" "gateway"="172.26.1.225"
VirtualMachines with Persistant volume claims depend on the BiosUUID to
be populated on the VM status before the volume controller takes over
and creates cnsnodeattachments objects. We need to wait for the volume
status to update to "Attached" (done by the volume_controller) before
poweringOn the VM.
Testing Done:
Manual Tests on a wcp testbed.
VM powers ON only after volume attachement
status:
biosUUID: 4217456e-3062-f67c-d20d-51920da1bb0a
changeBlockTracking: false
conditions:
- lastTransitionTime: "2021-04-05T22:53:04Z"
status: "True"
type: VirtualMachinePrereqReady
host: 10.161.111.180
instanceUUID: 5017e39d-a980-e1e4-5d40-7b98a6f9a862
phase: Created
powerState: poweredOn
uniqueID: vm-159
vmIp: 192.168.128.4
volumes:
- attached: true
diskUUID: 6000C29d-805a-c338-fda3-71f3a492e682
error: ""
name: my-pvc
integration tests: make test-integration
….16 of boringcrypto. Testing ======== - VM Operator SB Build - - CSP Build - - [x] dev-integ-vds - - [x] dev-integ-nsxt -
All the controller tests have been converted over so these all use the same new env test suite. Remove controllers from the old-style tests since including them there meant we were running the unit tests again.
This patch updates the Image Registry APIs that are vendored when pushing the project upstream.
Minimum allowed line rate is |
c112e57 to
17d9b78
Compare
fb3ece3 to
647a2a7
Compare
b8c2258 to
fa1e541
Compare
648c46b to
3e0f001
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.