Skip to content

Conversation

pkoutsovasilis
Copy link
Contributor

@pkoutsovasilis pkoutsovasilis commented Jun 23, 2025

What does this PR do?

This PR fixes a regression introduced by #6907, which updated the RPM/DEB preinstall script to stop the ElasticEndpoint service during agent upgrades to work around tamper protection restrictions. While effective in stopping the service, the original change restarted the endpoint before restarting the agent. This sequence causes most of the time endpoint to try and reconnect to elastic-agent but without any time guarantees when this is gonna be successful.

To address this, the PR:

  • Restart of the ElasticEndpoint service after the elastic-agent service has been restarted to guarantee that elastic-endpoint can connect to elastic-agent.
  • Enhances integration tests to:
    • Use locally built artifacts when testing same-version upgrades.
    • Improve error messages and fixture preparation robustness.

Why is it important?

Improper ordering of service restarts during DEB/RPM upgrades with endpoint tamper protection enabled was causing the endpoint to start independently of the agent, resulting in "always-retrying" and sporadic degraded operation. This fix ensures the services are brought up in the correct order to maintain endpoint health.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

How to test this PR locally

mage integration:auth
STACK_PROVISIONER=stateful mage integration:single TestUpgradeAgentWithTamperProtectedEndpoint_RPM

Related issues

@elastic-sonarqube
Copy link

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @pkoutsovasilis

@pkoutsovasilis pkoutsovasilis changed the title [rpm] experimenting [deb/rpm] restart endpoint with tamper protection after elastic-agent Jun 24, 2025
@pkoutsovasilis pkoutsovasilis added backport-8.19 Automated backport to the 8.19 branch and removed backport-skip labels Jun 24, 2025
@pkoutsovasilis pkoutsovasilis marked this pull request as ready for review June 24, 2025 07:21
@pkoutsovasilis pkoutsovasilis requested a review from a team as a code owner June 24, 2025 07:21
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@pkoutsovasilis pkoutsovasilis requested a review from kaanyalti June 24, 2025 07:21
Copy link

@kaanyalti kaanyalti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be backported to 8.18 and 9.0 as well. That being said, the changes look good to me. Thank you very much for helping out.

Edit: This should be backported to 9.0 because the initial PR was backported

@pchila
Copy link
Member

pchila commented Jun 24, 2025

Edit: This should be backported to 9.0 because the initial PR was backported

9.0 backport of #6907 has been already reverted with #8638 so the 9.0 backport label can be omitted

Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thank you for quickly fix this issue

@pkoutsovasilis pkoutsovasilis merged commit 249885f into elastic:main Jun 24, 2025
21 checks passed
mergify bot pushed a commit that referenced this pull request Jun 24, 2025
…#8637)

* fix: use rpm from local build

(cherry picked from commit 249885f)

# Conflicts:
#	dev-tools/packaging/templates/linux/postinstall.sh.tmpl
#	testing/integration/endpoint_security_test.go
pkoutsovasilis added a commit that referenced this pull request Jun 24, 2025
…tion after elastic-agent (#8646)

* [deb/rpm] restart endpoint with tamper protection after elastic-agent  (#8637)

* fix: use rpm from local build

(cherry picked from commit 249885f)

# Conflicts:
#	dev-tools/packaging/templates/linux/postinstall.sh.tmpl
#	testing/integration/endpoint_security_test.go

* Enhancement/6394 allow deb rpm to upgrade with endpoint tamper protection (#6907)

* Update pkg/testing/tools/tools.go

Co-authored-by: Paolo Chilà <[email protected]>

* enhancement(6394): updated preinstall script, updated service to use uninstall token

* enhancmenet(6394): updated the preinstall script

* enchancement(6394): started adding integraiton tests

* enhancement(6394): updated fixture install, updated endpoint security tests

* enhancement(6394): cleaned up fixture_install, added function that exposes fixture's uninstall tokens, updated tests

* enhancement(6394): refactored test code so that I can use it with rpm

* enhancement(6394): added tests to assert that tamper protection works

* enhancement(6394): updated the endpoint testing tools, fixture install functions and the deb rpm upgrade tests

* enhancement(6394): added test logs, updated rpm installation to set agent socket path

* enhancement(6394): remove commented code

* enhancement(6394): remove print statements

* enhancement(6394): remove unnecessary comments, refactor unused function

* enhancement(6394): revert var name change

* enhancement(6394): added changelog

* enchancement(6394): update test logs, add non integrative config to deb installation

* enhancement(6394): updated the endpoint version comparison and assertion

* enhancement(6394): added log in tests

* enhancement(6394): resorted to using previous major instead of minor in upgrade test

* enhancement(6394): updated endpoint version function in the tests, updated function name in testing tools

* enhancement(6394): use previous minor, fix log

* enhancement(6394): added comment explaining motive behind simple install functions

* enhancement(6394): updated return in tools

* Update changelog/fragments/1740166208-allow-deb-rpm-upgrade-with-tamper-protected-endpoint.yaml

Co-authored-by: Craig MacKenzie <[email protected]>

* enhancement(6394): fixed function call in tests

* enhancement(6394): added systemctl start in postinstall, refactored preinstall and added condition to make same version installations work

* enhancement(6394): updated the preinstall and postinstall scripts to troubleshoot

* enhancement(6394): updated preinstall and postinstall script templates

- Updated preinstall to stop endpoint if it is an available service regardless of the version of endpoint that's install
- Updated postintall to start endpoint if the old endpoint version and the new version match.

* enhancement(6394): removed error exit from postinstall

* enhancement(6394): updated postinstall and preinstall templates

- Preinstall now does not use a state file. Recovery from failure start ElasticEndpoint if it is not running
- Preinstall does not stop endpoint if tamper protection is not enabled
- Postinstall does not print an error if service is still running

* enhancement(6394): removed debug logs

* enhancement(6394): removed unnecessary comment

* enhancement(6394): store uninstall token as local var, uninstall through the agent

* enhancement(6394): added setclient function

* enhancement(6394): added getInstallCommand and replaced SimpleInstall

* enhancement(6394): added test case for error recovery. removed unused fixture functions

* enhancement(6394): refactored tests, consolidated test scenarios into one function

* enhancement(6394): remove unnecessary test functions

* enhancement(6394): remove unused fixture function

* enhancement(6394): revert unwanted installDeb changes

* enhancement(6394): remove unwanted changes in testing tools

* enhancement(6394): remove unused function call

* enhancement(6394): replacing systemctl instead of adding new one to path

* enhancement(6394): update real systemctl path in mock systemctl script

* enhancement(6394): fix linting errors

* Update changelog/fragments/1740166208-allow-deb-rpm-upgrade-with-tamper-protected-endpoint.yaml

Co-authored-by: Paolo Chilà <[email protected]>

* Update dev-tools/packaging/templates/linux/postinstall.sh.tmpl

Co-authored-by: Paolo Chilà <[email protected]>

* Update pkg/testing/tools/tools.go

Co-authored-by: Paolo Chilà <[email protected]>

* Update dev-tools/packaging/templates/linux/postinstall.sh.tmpl

Co-authored-by: Paolo Chilà <[email protected]>

* Update dev-tools/packaging/templates/linux/postinstall.sh.tmpl

Co-authored-by: Paolo Chilà <[email protected]>

* Update pkg/testing/tools/tools.go

Co-authored-by: Paolo Chilà <[email protected]>

* enhancement(6394): updated print statement

* enhancement(6394): remove unnecessary command

* enhancement(6394): use addressFromPath and SetClient

* enhancement(6394): using service name, fixed indentation

* test(debug): add detailed logging to Fixture.SetClient and installDeb for agent client setup debugging

* Revert "test(debug): add detailed logging to Fixture.SetClient and installDeb for agent client setup debugging"

This reverts commit 390c561.

* enhancement(6394): renamed SetClient to SetDebRpmClient. Using hardcoded working dir as fixture working dir does not work for determining socket path

* enhancement(6394): consolidated same version upgrade and regular upgrdade test functions

* enhancement(6394): simplify preinstall script and enhance upgrade tests for tamper protection
- Removed unnecessary endpoint handling logic from preinstall script.
- Improved checks for service installation and status before upgrade.
- Updated upgrade test functions to handle stopping the endpoint service before upgrades.

* enhancement(6394): remove
mock systemctl script for tamper protection tests

* enhancement(6394): remove unused import

* enhancement(6394): fixed order of execution in preinstall

* enhancement(6394): added tests to make sure deb/rpm upgrades work when endpoint is not tamper protected

---------

Co-authored-by: Paolo Chilà <[email protected]>
Co-authored-by: Craig MacKenzie <[email protected]>
(cherry picked from commit 8a6531f)

# Conflicts:
#	dev-tools/packaging/templates/linux/preinstall.sh.tmpl

# Conflicts:
#	dev-tools/packaging/templates/linux/postinstall.sh.tmpl
#	testing/integration/endpoint_security_test.go

* fix: resolve conflicts

* fix: use --force-confold for deb tests in TestUpgradeAgentWithTamperProtectedEndpoint_DEB

---------

Co-authored-by: Panos Koutsovasilis <[email protected]>
Co-authored-by: Kaan Yalti <[email protected]>
v1v added a commit that referenced this pull request Jun 25, 2025
…-hosted

* feature/hosted-stack-using-oblt-cli: (26 commits)
  Use the current official docker image for oblt-cli
  Mark the elasticinframetrics processor as deprecated and schedule for removal (#8659)
  [main][Automation] Update versions (#8668)
  chore: Update create_deployment_csp_configuration.yaml (#8669)
  Attempt to make test more reliable by querying ES directly (#8422)
  [test] split up ess and beats serverless integration tests (#8551)
  Remove resource/k8s processor and use k8sattributes processor for service attributes (#8599)
  fix: use --force-confold for deb tests in TestUpgradeAgentWithTamperProtectedEndpoint_DEB (#8649)
  [main][Automation] Bump stack images versions to 9.1.0-ea0b7542 (#8612)
  chore: Update to elastic/beats@f6594fb72670 (#8640)
  [deb/rpm] restart endpoint with tamper protection after elastic-agent  (#8637)
  ci: don't preinstall fleet packages on retried CI steps (#8636)
  chore: Update to elastic/beats@6b6941eed496 (#8619)
  [main][Automation] Bump VM Image version to 1750467641 (#8617)
  flaky: skip TestUpgradeAgentWithTamperProtectedEndpoint_RPM (#8626)
  Add skip-changelog PR label for bump VM PRs (#8627)
  build(deps): bump github.com/elastic/go-seccomp-bpf from 1.5.0 to 1.6.0 (#8611)
  [ci] fix k8s integration tests flakiness (#8575)
  bump apmconfig Otel extension to v0.3.0 (#8600)
  Enhancement/6394 allow deb rpm to upgrade with endpoint tamper protection (#6907)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.19 Automated backport to the 8.19 branch skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Flaky Test]: TestUpgradeAgentWithTamperProtectedEndpoint_RPM – Condition never satisfied

4 participants