-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[ws-manager] fix incorrect handling of failure state for workspaces #11489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
started the job as gitpod-build-pavel-10315-1.1 because the annotations in the pull request description changed |
/werft run with-clean-slate-deployment=true 👍 started the job as gitpod-build-pavel-10315-1.2 |
/hold |
return "", nil | ||
} | ||
return fmt.Sprintf("container %s completed; containers of a workspace pod are not supposed to do that. Reason: %s", cs.Name, terminationState.Message), nil | ||
// container terminated successfully - this is not a failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intent with this branch was to cover the case that a regular workspace stops without being deleted. An unintentional stop, so to speak. That said, the failure mode is actually nicer if your workspace just stops compared to get getting an inactionable error message ("containers of a workspace pod are not supposed to do that" 🤷).
The failed
condition on workspace status indeed isn't stable (but should be). ws-manager-bridge should protect us against that by not resetting that condition if it ever was set. Because we have nothing like an at-least-once delivery semantic, we might just miss the "failed" message though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will do more good than harm.
We could consider adding a metric to count unintentional pod termination - either in this PR (hence the hold), or in a separate one, or not at all :)
/hold
Added metric to track unintentional stops |
/unhold |
Description
This PR fixes two issues:
This two fixes need to go hand in hand.
For more info, see the link above for the comment describing the issue in greater detail.
Related Issue(s)
Fixes #10315
How to test
Open workspace in preview env
Close it
It should close normally.
Release Notes
Documentation
Werft options: