-
Notifications
You must be signed in to change notification settings - Fork 18
CI: Fix random failures. Merge workflows #787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Do not run export log if cancelled or not setup Attempt to always do the proper clean up Print more debug information in case the droplet ipv4 cannot be parsed
dceeecb
to
9b52c6a
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #787 +/- ##
=======================================
Coverage 63.33% 63.33%
=======================================
Files 77 77
Lines 6854 6854
Branches 576 576
=======================================
Hits 4341 4341
Misses 2325 2325
Partials 188 188 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
b1de8a4
to
f59ba26
Compare
9f5701f
to
37d6a3a
Compare
991a36f
to
a48ad50
Compare
The dep were not always installed depending on where the failure was Since this a debug step we don't want it to fail Do not run it if the workflow was cancelled
This allow to reuse the packages build in the previous workflow for the droplet test, reducing the number of package build from 12 to 3, in theses workflow. This: * fix the issue of package not being able to be built because of rate limitation on other resources. * Reduce the chances of random errors. * Reduce the total CI times requirement. * Do not attempt to run the droplet test if the package building phase fail. (Previously all the package build were launched in parallel which mean they all failed unecessaryely) * Make it less costly and faster to run the failed jobs With theses chance the number of CI failure reduce greatly. And the cause of failure is more clear
It might be a change in the Digital Ocean API but it return before the network is setup and with empty setup info it didn't seem to occur befor so it might be an API change from their part
Digital Ocean droplet always have a private IP in addition to the public one. The API return them in random order so the CI job occasionally tried to use the internal one and failed.
Same operation as moving the Droplet workflow, we reuse the already build package. The resilience and speed advantage are the sames and add up.
Rename the main workflow field Document more
In the run_on_droplet job we only require the .github/scripts dir
a48ad50
to
4396a91
Compare
nesitor
approved these changes
Mar 31, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fix the failure that have been happening recently on the CI on the Digital Ocean Droplet VM, and that prevented the PR from passing and being merged.
This is accomplished in two way:
Some minor improvement and documentation are also included.
Related ClickUp, GitHub or Jira tickets : Jira ALEPH-499
Changes
United workflow
This PR introduce a big architectural change as it merge the following workflows into one:
This merge allow the reuse of packages build in the previous steps for the droplet test, reducing the number of package built from 13 to 3.
Advantages:
Mainly it reduce the chances of CI failure due to external causes and make the failing step more apparent.
Digital Ocean Changes
It seem Digital Ocean changed the way the Droplet were provisioned and the command
doctl compute get
which we use to retrieve the newly created droplet ip.The command sometime returned without the droplet info or without the network information set, presumably before the setup was finished.
Solution: A workaround were added against this, we now wait and run the command till it return the proper info.
Second issue is that we sometime got another IP, the one on the internal network, that could not be reached from the GitHub runner.
As the IP was recalculated between some steps, it make some steps fails while other worked.
Solution: 1. the IP is calculated only once and saved in a Github CI env var
Solution: 2. The code ensure the Public one is used.
Self proofreading checklist
packaging/Makefile
How to test
it is not relevant to test manually locally.
Rebase your Pull Request on top of this branch or master once it is merged.
Print screen / video
Yay a pretty arrow

Notes
Do not squash this branch!
I reworked each commit to make them documented and self explanatory. The Github workflows file format is quiet unreadable and the commit history help.
[1] e.g.
That error happened randomly and was determined to be caused by rate limiting.
We also had previous failure with the Docker registry that prevented the use of the vm-connector docker image