-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Closed
Copy link
Labels
meta: staleThis issue/PR is stale and will be closed soonThis issue/PR is stale and will be closed soonteam: workspaceIssue belongs to the Workspace teamIssue belongs to the Workspace teamtype: bugSomething isn't workingSomething isn't working
Description
Bug description
Looking at the logs we're seeing a surprising amount of OTS download failures as part of workspace content initialisation: https://console.cloud.google.com/logs/query;query=%22cannot%20download%20OTS%22%0A;timeRange=PT24H;cursorTimestamp=2022-02-08T14:43:32Z?project=workspace-clusters
Each of those failures is likely to yield a failed workspace - at least if the repo was private.
Possible contributing factors:
(edited by Sven)
- as this seems to happen only in prebuilds (only US cluster and lots of this error messages in d_b_prebuild_workspace), it could be that for some reason the time between the OTS is created and when it gets requested is longer than 30min (the lifetime of a token). Prebuild clusters are sometimes heavily packed so maybe there is just too much time in scaling up etc.
- we attempt to download the OTS multiple times for some reason. That's most likely a bug in the initializer. Checking the server logs and/or adding metrics would help identifying this.
As part of a fix, we should introduce OTS download failure metrics and keep an eye on them.
Steps to reproduce
Check the logs
Metadata
Metadata
Assignees
Labels
meta: staleThis issue/PR is stale and will be closed soonThis issue/PR is stale and will be closed soonteam: workspaceIssue belongs to the Workspace teamIssue belongs to the Workspace teamtype: bugSomething isn't workingSomething isn't working