-
Notifications
You must be signed in to change notification settings - Fork 197
[ci] fix k8s integration tests flakiness #8575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] fix k8s integration tests flakiness #8575
Conversation
9e27cf0 to
d00a1b8
Compare
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
…to minimise flakiness due to transient errors
d00a1b8 to
b2d39c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question and a small nitpick on some test path (both are non-blocking)
LGTM otherwise
💛 Build succeeded, but was flaky
Failed CI StepsHistory
|
|
After some testing with @pkoutsovasilis , it seems that we can vendor our helm dependencies in uncompressed form: this means that instead of including deploy/helm/elastic-agent/charts/kube-state-metrics-5.30.1.tgz we can include the uncompressed This has the benefit of not having to include a binary file in our git changes and we can more clearly see what changes when chart version gets bumped. @pkoutsovasilis could you please add a commit with such a change ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it increases the number of new files, I prefer the exploded charts to having a .tgz committed in git.
It's a shame for the lint GH action not to support diffs over 20k lines for PRs so it cannot filter the linter violations only to the modified files but this should not be a recurring problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It's unfortunate that we need to vendor all this code, and in an ideal world I'd prefer if these artifacts were cached locally on our CI runners instead. But this should work and probably won't be too burdensome to maintain.
Left some questions and nitpicks that shouldn't block merging.
|
|
@Mergifyio backport 8.17 8.18 8.19 9.0 |
✅ Backports have been created
|
* feat: vendor all necessary test artifacts for kubernetes integration to minimise flakiness due to transient errors * fix: correct decode api key * fix: clear CA_TRUSTED env var for kustomize * fix: bump memory limits for kustomize * fix: fabricate paths leveraging filepath * fix: remove redundant file moving when downloading kube stack helm chart * feat: vendor expanded archives * fix: use filepath.Join * doc: update BuildDependencies godoc (cherry picked from commit 7259e54) # Conflicts: # NOTICE-fips.txt # NOTICE.txt # go.mod # magefile.go # testing/integration/k8s/journald_test.go # testing/integration/k8s/kubernetes_agent_standalone_test.go
* feat: vendor all necessary test artifacts for kubernetes integration to minimise flakiness due to transient errors * fix: correct decode api key * fix: clear CA_TRUSTED env var for kustomize * fix: bump memory limits for kustomize * fix: fabricate paths leveraging filepath * fix: remove redundant file moving when downloading kube stack helm chart * feat: vendor expanded archives * fix: use filepath.Join * doc: update BuildDependencies godoc (cherry picked from commit 7259e54) # Conflicts: # NOTICE-fips.txt # NOTICE.txt # go.mod # testing/integration/k8s/journald_test.go # testing/integration/k8s/kubernetes_agent_standalone_test.go
* feat: vendor all necessary test artifacts for kubernetes integration to minimise flakiness due to transient errors * fix: correct decode api key * fix: clear CA_TRUSTED env var for kustomize * fix: bump memory limits for kustomize * fix: fabricate paths leveraging filepath * fix: remove redundant file moving when downloading kube stack helm chart * feat: vendor expanded archives * fix: use filepath.Join * doc: update BuildDependencies godoc (cherry picked from commit 7259e54) # Conflicts: # testing/integration/k8s/journald_test.go # testing/integration/k8s/kubernetes_agent_standalone_test.go
* feat: vendor all necessary test artifacts for kubernetes integration to minimise flakiness due to transient errors * fix: correct decode api key * fix: clear CA_TRUSTED env var for kustomize * fix: bump memory limits for kustomize * fix: fabricate paths leveraging filepath * fix: remove redundant file moving when downloading kube stack helm chart * feat: vendor expanded archives * fix: use filepath.Join * doc: update BuildDependencies godoc (cherry picked from commit 7259e54) # Conflicts: # NOTICE-fips.txt # NOTICE.txt # go.mod # testing/integration/k8s/journald_test.go
* [ci] fix k8s integration tests flakiness (#8575) * feat: vendor all necessary test artifacts for kubernetes integration to minimise flakiness due to transient errors * fix: correct decode api key * fix: clear CA_TRUSTED env var for kustomize * fix: bump memory limits for kustomize * fix: fabricate paths leveraging filepath * fix: remove redundant file moving when downloading kube stack helm chart * feat: vendor expanded archives * fix: use filepath.Join * doc: update BuildDependencies godoc (cherry picked from commit 7259e54) # Conflicts: # testing/integration/k8s/journald_test.go # testing/integration/k8s/kubernetes_agent_standalone_test.go * fix: resolve conflicts --------- Co-authored-by: Panos Koutsovasilis <[email protected]>
* [ci] fix k8s integration tests flakiness (#8575) * feat: vendor all necessary test artifacts for kubernetes integration to minimise flakiness due to transient errors * fix: correct decode api key * fix: clear CA_TRUSTED env var for kustomize * fix: bump memory limits for kustomize * fix: fabricate paths leveraging filepath * fix: remove redundant file moving when downloading kube stack helm chart * feat: vendor expanded archives * fix: use filepath.Join * doc: update BuildDependencies godoc (cherry picked from commit 7259e54) # Conflicts: # NOTICE-fips.txt # NOTICE.txt # go.mod # magefile.go # testing/integration/k8s/journald_test.go # testing/integration/k8s/kubernetes_agent_standalone_test.go * fix: resolve conflicts * fix: rework CA_TRUSTED elimination * fix: add ELASTIC_AGENT_OTEL in TestKubernetesAgentOtel --------- Co-authored-by: Panos Koutsovasilis <[email protected]>
* [ci] fix k8s integration tests flakiness (#8575) * feat: vendor all necessary test artifacts for kubernetes integration to minimise flakiness due to transient errors * fix: correct decode api key * fix: clear CA_TRUSTED env var for kustomize * fix: bump memory limits for kustomize * fix: fabricate paths leveraging filepath * fix: remove redundant file moving when downloading kube stack helm chart * feat: vendor expanded archives * fix: use filepath.Join * doc: update BuildDependencies godoc (cherry picked from commit 7259e54) # Conflicts: # NOTICE-fips.txt # NOTICE.txt # go.mod # testing/integration/k8s/journald_test.go # testing/integration/k8s/kubernetes_agent_standalone_test.go * fix: resolve conflicts * fix: update NOTICE.txt * fix: rework CA_TRUSTED elimination * fix: add ELASTIC_AGENT_OTEL in TestKubernetesAgentOtel --------- Co-authored-by: Panos Koutsovasilis <[email protected]>
* [ci] fix k8s integration tests flakiness (#8575) * feat: vendor all necessary test artifacts for kubernetes integration to minimise flakiness due to transient errors * fix: correct decode api key * fix: clear CA_TRUSTED env var for kustomize * fix: bump memory limits for kustomize * fix: fabricate paths leveraging filepath * fix: remove redundant file moving when downloading kube stack helm chart * feat: vendor expanded archives * fix: use filepath.Join * doc: update BuildDependencies godoc (cherry picked from commit 7259e54) # Conflicts: # NOTICE-fips.txt # NOTICE.txt # go.mod # testing/integration/k8s/journald_test.go * fix: resolve conflicts --------- Co-authored-by: Panos Koutsovasilis <[email protected]>
…-hosted * feature/hosted-stack-using-oblt-cli: (26 commits) Use the current official docker image for oblt-cli Mark the elasticinframetrics processor as deprecated and schedule for removal (#8659) [main][Automation] Update versions (#8668) chore: Update create_deployment_csp_configuration.yaml (#8669) Attempt to make test more reliable by querying ES directly (#8422) [test] split up ess and beats serverless integration tests (#8551) Remove resource/k8s processor and use k8sattributes processor for service attributes (#8599) fix: use --force-confold for deb tests in TestUpgradeAgentWithTamperProtectedEndpoint_DEB (#8649) [main][Automation] Bump stack images versions to 9.1.0-ea0b7542 (#8612) chore: Update to elastic/beats@f6594fb72670 (#8640) [deb/rpm] restart endpoint with tamper protection after elastic-agent (#8637) ci: don't preinstall fleet packages on retried CI steps (#8636) chore: Update to elastic/beats@6b6941eed496 (#8619) [main][Automation] Bump VM Image version to 1750467641 (#8617) flaky: skip TestUpgradeAgentWithTamperProtectedEndpoint_RPM (#8626) Add skip-changelog PR label for bump VM PRs (#8627) build(deps): bump github.com/elastic/go-seccomp-bpf from 1.5.0 to 1.6.0 (#8611) [ci] fix k8s integration tests flakiness (#8575) bump apmconfig Otel extension to v0.3.0 (#8600) Enhancement/6394 allow deb rpm to upgrade with endpoint tamper protection (#6907) ...





What does this PR do?
This PR introduces the following changes:
Vendor external Kubernetes artifacts for testing:
kustomizeconfiguration are vendored to remove the runtime dependency onkustomize.To support this, a new
magetarget calledintegration:buildKubernetesTestDatawas added. This target is now invoked as a prerequisite by:integration:testKubernetesintegration:testKubernetesMatrixintegration:testKubernetesSingleThis aims to prevent CI failures from GitHub/network issues and addresses #8319.
Fix decoding of Beats-style API Keys in K8s tests:
Fix
%CA_TRUSTED%environment variable injection in Kustomize tests:ca_trusted_fingerprintvalues caused by%CA_TRUSTED%placeholders not being overridden.CA_TRUSTEDto an empty value in the relevant test environments to ensure expected TLS behavior.Increase memory limits for Elastic Agent in Kustomize-based tests:
OOMKillederrors were observed (example 1, example 2).Why is it important?
Checklist
./changelog/fragmentsusing the changelog toolDisruptive User Impact
None expected. All changes are isolated to the Kubernetes integration test setup and do not affect runtime or user configurations.
How to test this PR locally
Related issues