Allow restarting workspaces with forceDefaultImage=true even when another instance is already running #3993

jankeromnes · 2021-04-19T08:40:57Z

How to test:

Open https://jx-restart-w-default-image.staging.gitpod-dev.com/#https://github.com/jankeromnes/gitpod-hanging-build (and log in if needed)
The Docker build will call sleep 1h
But, you should still be able to click on "Continue with Default Image" at any time & get a running workspace

jankeromnes · 2021-04-19T08:43:13Z

@csweichel Could you please take a careful look? (I really don't want to introduce new bugs in the workspace start logic.)

Note: This change seems to work well / as expected. However, I get no Docker build logs in core-dev -- not sure why.

jankeromnes · 2021-04-21T06:26:21Z

/werft run

👍 started the job as gitpod-build-jx-restart-w-default-image.1

jankeromnes · 2021-04-21T12:31:56Z

/werft run

👍 started the job as gitpod-build-jx-restart-w-default-image.2

jankeromnes

Adding review comments from a call with @csweichel here for the record (many thanks Chris!)

components/server/src/workspace/gitpod-server-impl.ts

jankeromnes · 2021-04-21T13:58:15Z

components/server/src/workspace/gitpod-server-impl.ts

+                // We already have a running workspace. This may happen if we're forcing the default image.
+                // In that case, we stop the previous first.
+                await this.internalStopWorkspace({ span }, workspaceId, workspace.ownerId).catch(err => {
+                    log.error(logCtx, "stopWorkspace error: ", err);
+                });


This call only blocks until the stop request is sent.

In order to completely prevent multiple instances from running at the same time, we should probably also listen for instance updates here (e.g. via messageBusIntegration.listenForWorkspaceInstanceUpdates()) and only move on when the previous instance is definitely stopped.

There is also a case where the previous instance was in a different region, in which case to be fully pedantic we should also poll the DB, but it's probably okay to not do that here.

jankeromnes · 2021-04-23T08:56:05Z

Code review feedback from @csweichel implemented ✅ and still works as expected. 🎉

Two questions:

Shouldn't this access guard fail:

gitpod/components/server/src/workspace/gitpod-server-impl.ts

Lines 426 to 427 in 27056fe

    
           // no matter if the workspace is shared or not, you cannot create a new instance 
        
           await this.guardAccess({ kind: "workspaceInstance", subject: undefined, workspaceOwnerID: workspace.ownerId, workspaceIsShared: false }, "create");

given that the previously running instance is only stopped afterwards?

gitpod/components/server/src/workspace/gitpod-server-impl.ts

Lines 440 to 446 in 27056fe

    
           if (runningInstance) { 
        
               // We already had a running workspace. This may happen if we're forcing the default image. 
        
               // In that case, we first stop the previous workspace, and wait for it to be completely gone. 
        
               await this.internalStopWorkspaceAndWaitForInstance({ span }, workspaceId, workspace.ownerId).catch(err => { 
        
                   log.error(logCtx, "internalStopWorkspaceAndWaitForInstance error: ", err); 
        
               }); 
        
           }

Also, an observed error logged by server:

{"@type":"type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent","serviceContext":{"service":"server","version":"jx-restart-w-default-image.3"},"stack_trace":"Error: Unknown workspace manager \"\"\n    at WorkspaceManagerClientProvider.<anonymous> (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:123:51)\n    at step (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:58:23)\n    at Object.next (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:39:53)\n    at fulfilled (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:30:58)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:97:5)","component":"server","severity":"ERROR","time":"2021-04-23T08:46:17.766Z","environment":"devstaging","region":"europe-west1","context":{"userId":"7792a3aa-7416-4dee-a899-e132467125d4","workspaceId":"black-goldfish-rtllylsu"},"message":"internalStopWorkspaceAndWaitForInstance error: ","error":"Error: Unknown workspace manager \"\"\n    at WorkspaceManagerClientProvider.<anonymous> (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:123:51)\n    at step (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:58:23)\n    at Object.next (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:39:53)\n    at fulfilled (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:30:58)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:97:5)"}

Prettified error stack:

Error: Unknown workspace manager ""
    at WorkspaceManagerClientProvider.<anonymous> (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:123:51)
    at step (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:58:23)
    at Object.next (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:39:53)
    at fulfilled (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:30:58)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)

Unsure if related or serious. (Could it be that's because we have no GCP region information in core-dev?)

csweichel · 2021-04-26T14:57:59Z

Code review feedback from @csweichel implemented ✅ and still works as expected. 🎉

Two questions:

Shouldn't this access guard fail:

gitpod/components/server/src/workspace/gitpod-server-impl.ts

Lines 426 to 427 in 27056fe

// no matter if the workspace is shared or not, you cannot create a new instance

await this.guardAccess({ kind: "workspaceInstance", subject: undefined, workspaceOwnerID: workspace.ownerId, workspaceIsShared: false }, "create");

No. It only checks if the user has the permission to execute that operation, but not if the operation makes sense.

given that the previously running instance is only stopped afterwards?

gitpod/components/server/src/workspace/gitpod-server-impl.ts

Lines 440 to 446 in 27056fe

if (runningInstance) {

// We already had a running workspace. This may happen if we're forcing the default image.

// In that case, we first stop the previous workspace, and wait for it to be completely gone.

await this.internalStopWorkspaceAndWaitForInstance({ span }, workspaceId, workspace.ownerId).catch(err => {

log.error(logCtx, "internalStopWorkspaceAndWaitForInstance error: ", err);

});

}

Also, an observed error logged by server:

Prettified error stack:
Error: Unknown workspace manager ""
    at WorkspaceManagerClientProvider.<anonymous> (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:123:51)
    at step (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:58:23)
    at Object.next (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:39:53)
    at fulfilled (/app/node_modules/@gitpod/ws-manager/lib/client-provider.js:30:58)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
Unsure if related or serious. (Could it be that's because we have no GCP region information in core-dev?)

The problem here is that the instance wasn't actually started yet, i.e. it didn't go through actuallyStartWorkspace yet. In internalStopWorkspaceAndWaitForInstance you could check if the region is set, and if not instead of calling ws-manager just wait for some time - you would not know which workspace manager to talk to after all. There's a chance for a race condition here (some other server instance could be starting the instance currently), so in lieu of proper cross-server-instance-locking waiting for say 10 seconds (please add a comment) should prevent that race.

…prevent restart spam

… when another instance is already running Fixes #3777

jankeromnes · 2021-04-26T20:07:55Z

No. It only checks if the user has the permission to execute that operation, but not if the operation makes sense.

Aha, that makes sense. Thanks!

The problem here is that the instance wasn't actually started yet, i.e. it didn't go through actuallyStartWorkspace yet.

Thanks for the pointer! Looking into actuallyStartWorkspace, I now think it's expected that instances don't get assigned a region until after their Docker build is complete, as:

gitpod/components/server/src/workspace/workspace-starter.ts

Lines 132 to 133 in 2e7cba9

    
           // build workspace image 
        
           instance = await this.buildWorkspaceImage({ span }, user, workspace, instance);

happens before:

gitpod/components/server/src/workspace/workspace-starter.ts

Lines 156 to 159 in 2e7cba9

    
           // tell the world we're starting this instance 
        
           const { manager, installation } = await this.clientProvider.getStartManager(); 
        
           instance.status.phase = "pending"; 
        
           instance.region = installation;

So in our case (long-running Docker build that we want to interrupt/skip), we should expect not to have a region.

How should we stop "preparing" instances that don't have a region yet? Should we make imageBuilder interruptible? Or "detach" the running instance by deleting its workspace ID? 🤔

jankeromnes · 2021-04-29T06:03:40Z

After discussing this further, we've decided to not make Docker builds interruptible just yet, and instead remove the button when builds are in progress (to align with the previous dashboard design).

Superseded by #4104

jankeromnes requested a review from csweichel April 19, 2021 08:43

jankeromnes marked this pull request as ready for review April 19, 2021 08:53

jankeromnes commented Apr 21, 2021

View reviewed changes

jankeromnes removed the request for review from csweichel April 21, 2021 14:11

jankeromnes force-pushed the jx/restart-w-default-image branch from f477ac3 to 27056fe Compare April 23, 2021 08:15

jankeromnes requested review from aledbf and csweichel April 23, 2021 10:22

jankeromnes added 2 commits April 26, 2021 19:22

[dashboard] Disable 'Continue with Default Image' button on click to …

f1c815a

…prevent restart spam

[server] Allow restarting workspaces with forceDefaultImage=true even…

de1eff9

… when another instance is already running Fixes #3777

jankeromnes force-pushed the jx/restart-w-default-image branch from 27056fe to de1eff9 Compare April 26, 2021 19:22

jankeromnes marked this pull request as draft April 26, 2021 20:59

jankeromnes mentioned this pull request Apr 29, 2021

[dashboard] Don't show 'Continue with Default Image' button while the build is still running #4104

Merged

jankeromnes closed this Apr 29, 2021

jankeromnes deleted the jx/restart-w-default-image branch May 19, 2021 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow restarting workspaces with forceDefaultImage=true even when another instance is already running #3993

Allow restarting workspaces with forceDefaultImage=true even when another instance is already running #3993

Uh oh!

jankeromnes commented Apr 19, 2021

Uh oh!

jankeromnes commented Apr 19, 2021 •

edited

Loading

Uh oh!

jankeromnes commented Apr 21, 2021 •

edited by werft-gitpod-dev-com bot

Loading

Uh oh!

jankeromnes commented Apr 21, 2021 •

edited by werft-gitpod-dev-com bot

Loading

Uh oh!

jankeromnes left a comment

Uh oh!

Uh oh!

jankeromnes Apr 21, 2021

Uh oh!

jankeromnes commented Apr 23, 2021 •

edited

Loading

Uh oh!

csweichel commented Apr 26, 2021 •

edited by werft-gitpod-dev-com bot

Loading

Uh oh!

jankeromnes commented Apr 26, 2021 •

edited by werft-gitpod-dev-com bot

Loading

Uh oh!

jankeromnes commented Apr 29, 2021

Uh oh!

Uh oh!

Allow restarting workspaces with forceDefaultImage=true even when another instance is already running #3993

Allow restarting workspaces with forceDefaultImage=true even when another instance is already running #3993

Uh oh!

Conversation

jankeromnes commented Apr 19, 2021

Uh oh!

jankeromnes commented Apr 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jankeromnes commented Apr 21, 2021 • edited by werft-gitpod-dev-com bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jankeromnes commented Apr 21, 2021 • edited by werft-gitpod-dev-com bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jankeromnes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jankeromnes Apr 21, 2021

Choose a reason for hiding this comment

Uh oh!

jankeromnes commented Apr 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csweichel commented Apr 26, 2021 • edited by werft-gitpod-dev-com bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jankeromnes commented Apr 26, 2021 • edited by werft-gitpod-dev-com bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jankeromnes commented Apr 29, 2021

Uh oh!

Uh oh!

jankeromnes commented Apr 19, 2021 •

edited

Loading

jankeromnes commented Apr 21, 2021 •

edited by werft-gitpod-dev-com bot

Loading

jankeromnes commented Apr 21, 2021 •

edited by werft-gitpod-dev-com bot

Loading

jankeromnes commented Apr 23, 2021 •

edited

Loading

csweichel commented Apr 26, 2021 •

edited by werft-gitpod-dev-com bot

Loading

jankeromnes commented Apr 26, 2021 •

edited by werft-gitpod-dev-com bot

Loading