Closed
Description
Summary
The goal of this epic is to focus on two things:
- fix the most glaring issues we currently have with prebuilds
- ensure quality beyond that by adding tests and metrics as we see fit
Context
Historically the "prebuild experience" has been really shaky (cmp. #7812, for instance), mostly because:
- it's a rather high-level feature that involves a lot of moving parts to ⚙️ into each other
- it has close to zero test coverage
- it's opaque to users as to a) why and b) if a prebuild is executed after they "triggered" it: some of the perceived instability is actually working as intended but a) we're doing a bad job at explaining it and b), it's not observable for users (not exactly this epic, but touches parts of this one)
- we do not have proper metrics + alerts setup (SLI anyone)
As one of our quarterly goals is to improve perceived reliability, and we're driving usage-based pricing now, it feels like the right time to step up our game in this area. 💪
We already have two other related epics that have a certain overlap:
We might tackle those as well if there's time. But we start out with the issues listed here, and and work our way towards those. Also, I expect that the individual issues are a) outdated and b) have overlap themselves, so we'll have to draw and move the line as we go.
Value
- we fix some immediate issues that plague our customers
- we ensure those do not happen again, or are detected earlier when they happen again next time
Acceptance Criteria
- all issues referenced in this epic are done ✔️
- we have metrics from which we can derive a "success rate" (bonus points for having an SLI dashboard)
Measurement
The user perceived reliability increased.
Tasks:
Issues
- [prebuild] Not starting workspace once the prebuild is done #8195
(@AlexTugarev Fix hanging "Prebuild in Progress" page #10357)
(@AlexTugarev Listen on instance updates of a running prebuild #10646) - Prebuilds are stuck in 'queued' #9395
(@geropl ) - Add metrics to be able to identify leaking prebuild #10383
(@laushinka ) - Prebuild only run when first time of project add into self-host Gitpod #10024
(@laushinka and @jankeromnes – [server] Don't skip prebuilds if .gitpod.yml has a 'before' task but no 'init' task #10352) - UX: Dead end when image-build fails during prebuild #4879
- [prebuild] Not able to consistently view logs and start related workspaces #8324
- Prebuild logs stop streaming #8684
- Various prebuild issues #6391
- Prebuild status indicator on "Branches" can be misleading #5908
- [bridge] Ensure we do not override "failed" instance states (affects prebuilds!) #8596
- Use the same prebuild view for Workspace start as for Prebuild Details #9132
deferring
- [server] add a metric for received webhook events #9170
- Improve observability of webhooks #10341 (JL)
- Show prebuild logs for Admin prebuild detail #8452 (JL)
Observability/Metrics
Metadata
Metadata
Assignees
Type
Projects
Status
Done