supervisor: Ignore the terminated signal #13790

utam0k · 2022-10-12T06:48:45Z

Description

Until now, the start-up time of a workspace is slower than the start-up of an IDE but reserved. Therefore, if you requested to exit immediately after starting the workspace such as the integration test, the IDE was sometimes not ready!

Related Issue(s)

None

How to test

Pass the integration test

Release Notes

Ignore a noisy error in the supervisor

Documentation

Werft options:

/werft with-local-preview
If enabled this will build install/preview
/werft with-preview
/werft with-integration-tests=all
Valid options are all, workspace, webapp, ide

werft-gitpod-dev-com · 2022-10-12T06:48:52Z

started the job as gitpod-build-to-ch-inte.5 because the annotations in the pull request description changed
(with .werft/ from main)

utam0k · 2022-10-12T07:07:49Z

/werft run with-large-vm with-preview with-integration-tests=workspace

👍 started the job as gitpod-build-to-ch-inte.6
(with .werft/ from main)

iQQBot · 2022-10-12T07:58:18Z

/werft run

👍 started the job as gitpod-build-to-ch-inte.7
(with .werft/ from main)

kylos101 · 2022-10-13T00:03:41Z

@iQQBot are there IDE integration tests that we could run against this, too?

iQQBot · 2022-10-13T05:42:20Z

Why could this happen?

gitpod/test/pkg/integration/workspace.go

Lines 216 to 217 in 7d06687

    
           lastStatus, err := WaitForWorkspaceStart(ctx, instanceID.String(), api, options.WaitForOpts...) 
        
           if err != nil {

we waiting the workspace to be ready, and ready means ide is ready

gitpod/components/ws-manager/pkg/manager/probe.go

Lines 41 to 49 in 7d06687

    
           workspaceURL.Path += "/_supervisor/v1/status/ide" 
        
           readyURL := workspaceURL.String() 
        
           return WorkspaceReadyProbe{ 
        
           	Timeout:     5 * time.Second, 
        
           	RetryDelay:  500 * time.Millisecond, 
        
           	readyURL:    readyURL, 
        
           	workspaceID: workspaceID, 
        
           }

utam0k · 2022-10-13T05:43:58Z

Why could this happen?

gitpod/test/pkg/integration/workspace.go

Lines 216 to 217 in 7d06687

lastStatus, err := WaitForWorkspaceStart(ctx, instanceID.String(), api, options.WaitForOpts...)

if err != nil {

we waiting the workspace to be ready, and ready means ide is ready

gitpod/components/ws-manager/pkg/manager/probe.go

Lines 41 to 49 in 7d06687

workspaceURL.Path += "/_supervisor/v1/status/ide"

readyURL := workspaceURL.String()

return WorkspaceReadyProbe{

Timeout: 5 * time.Second,

RetryDelay: 500 * time.Millisecond,

readyURL: readyURL,

workspaceID: workspaceID,

}

Perhaps this is due to the IDE starting later than the time it takes to complete workspace startup.

iQQBot · 2022-10-13T05:57:08Z

Perhaps this is due to the IDE starting later than the time it takes to complete workspace startup.

This should not happen, the workspace startup complete includes IDE ready, see my code point

iQQBot · 2022-10-13T05:59:47Z

The gitpod/never-ready annotation will exist on the workspace pod until the ide ready probe returns success

utam0k · 2022-10-13T06:08:06Z

@iQQBot Is it impossible for this code to mark the IDE as Ready?

gitpod/components/supervisor/pkg/supervisor/supervisor.go

Lines 818 to 821 in 7d06687

    
           go func() { 
        
           	IDEStatus := runIDEReadinessProbe(cfg, ideConfig, ide) 
        
           	ideReady.Set(true, IDEStatus) 
        
           }()

iQQBot · 2022-10-13T06:15:13Z

@iQQBot Is it impossible for this code to mark the IDE as Ready?

gitpod/components/supervisor/pkg/supervisor/supervisor.go

Lines 818 to 821 in 7d06687

go func() {

IDEStatus := runIDEReadinessProbe(cfg, ideConfig, ide)

ideReady.Set(true, IDEStatus)

}()

This is the internal probe of the IDE, and only if it passes here successfully will the supervisor report the IDE ready to the outside, the supervisor does not directly change the state of any workspace. ws-manager continues to probe the supervisor's API until it reports a success before deleting the annotation.

gitpod/components/supervisor/pkg/supervisor/services.go

Lines 148 to 161 in 7d06687

    
           	ok, _ := s.ideReady.Get() 
        
           	desktopStatus := &api.IDEStatusResponse_DesktopStatus{} 
        
           	if s.desktopIdeReady != nil { 
        
           		okR, i := s.desktopIdeReady.Get() 
        
           		if i != nil { 
        
           			desktopStatus.Link = i.Link 
        
           			desktopStatus.Label = i.Label 
        
           			desktopStatus.ClientID = i.ClientID 
        
           			desktopStatus.Kind = i.Kind 
        
           		} 
        
           		ok = ok && okR 
        
           	} 
        
           	return &api.IDEStatusResponse{Ok: ok, Desktop: desktopStatus}, nil 
        
           }

utam0k · 2022-10-13T06:21:53Z

@iQQBot ws-manager should only return RUNNING under the following conditions. Is there anything you can think of?

gitpod/components/ws-manager/pkg/manager/status.go

Lines 485 to 496 in 36beceb

    
           if wso.IsWorkspaceHeadless() && tpe != api.WorkspaceType_PREBUILD { 
        
           	// headless workspaces (except prebuilds) don't expose a public service and thus cannot be asked about their status. 
        
           	// once kubernetes reports the workspace running, so do we. 
        
           	result.Phase = api.WorkspacePhase_RUNNING 
        
           	return nil 
        
           } 
        
           if _, neverReady := pod.Annotations[workspaceNeverReadyAnnotation]; !neverReady { 
        
           	// workspcae has been marked ready by a workspace-ready probe of the monitor 
        
           	result.Phase = api.WorkspacePhase_RUNNING 
        
           	return nil 
        
           }

OR

gitpod/components/ws-manager/pkg/manager/status.go

Lines 576 to 592 in 36beceb

    
           if terminationState.ExitCode != 0 && terminationState.Message != "" { 
        
           	var phase *api.WorkspacePhase 
        
           	if !isPodBeingDeleted(pod) { 
        
           		// If the wrote a termination message and is not currently being deleted, 
        
           		// then it must have been/be running. If we did not force the phase here, 
        
           		// we'd be in unknown. 
        
           		c := api.WorkspacePhase_RUNNING 
        
           		phase = &c 
        
           	} 
        
           	// the container itself told us why it was terminated - use that as failure reason 
        
           	return extractFailureFromLogs([]byte(terminationState.Message)), phase 
        
           } else if terminationState.Reason == "Error" { 
        
           	if !isPodBeingDeleted(pod) && terminationState.ExitCode != containerKilledExitCode { 
        
           		phase := api.WorkspacePhase_RUNNING 
        
           		return fmt.Sprintf("container %s ran with an error: exit code %d", cs.Name, terminationState.ExitCode), &phase 
        
           	}

iQQBot · 2022-10-13T06:30:14Z

ws-manager should only return RUNNING under the following conditions. Is there anything you can think of?

That's what I find strange, your test case is the regular workspace, so it's not headless and the first condition doesn't hold

And the second condition requires that the neverready annotation does not exist, and to achieve this condition, ide is must ready

iQQBot · 2022-10-13T14:54:58Z

Hi @utam0k I found the root cause, it is because

ws-manager will follow redirect when it run ide ready probe
integration test uses the wrong servicePrefix, so ws-proxy will redirect it to dashboard workspace not found page, and this page status code is 200... This way ws-manager will incorrectly assume that ide is ready
ws-manager didn't check ide ready response

and another small thing is ws-manager request ide ready probe too frequently (every 500ms, and didn't use /wait/true API)

I fix this in #13828 this PR

iQQBot · 2022-10-13T16:02:26Z

/hold

utam0k · 2022-10-14T05:25:57Z

Hi @utam0k I found the root cause, it is because
1. `ws-manager` will follow redirect when it run ide ready probe

2. integration test uses the wrong servicePrefix, so `ws-proxy` will redirect it to `dashboard` workspace not found page, and this page status code is 200... This way `ws-manager` will incorrectly assume that ide is ready

3. `ws-manager` didn't check ide ready response
and another small thing is ws-manager request ide ready probe too frequently (every 500ms, and didn't use /wait/true API)

I fix this in #13828 this PR

👀 Thanks @iQQBot

supervisor: Ignore the terminated signal

7d06687

utam0k requested a review from a team October 12, 2022 06:48

roboquat added the release-note label Oct 12, 2022

roboquat added the size/XS label Oct 12, 2022

github-actions bot added the team: IDE label Oct 12, 2022

utam0k marked this pull request as draft October 12, 2022 06:49

roboquat added the do-not-merge/work-in-progress label Oct 12, 2022

utam0k marked this pull request as ready for review October 12, 2022 08:00

roboquat removed the do-not-merge/work-in-progress label Oct 12, 2022

iQQBot self-requested a review October 12, 2022 09:50

utam0k self-assigned this Oct 12, 2022

jenting approved these changes Oct 13, 2022

View reviewed changes

roboquat added the do-not-merge/hold label Oct 13, 2022

utam0k closed this Oct 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

supervisor: Ignore the terminated signal #13790

supervisor: Ignore the terminated signal #13790

Uh oh!

utam0k commented Oct 12, 2022

Uh oh!

werft-gitpod-dev-com bot commented Oct 12, 2022

Uh oh!

utam0k commented Oct 12, 2022 •

edited by werft-gitpod-dev-com bot

Loading

Uh oh!

iQQBot commented Oct 12, 2022 •

edited by werft-gitpod-dev-com bot

Loading

Uh oh!

kylos101 commented Oct 13, 2022

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

utam0k commented Oct 13, 2022

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

utam0k commented Oct 13, 2022

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

utam0k commented Oct 13, 2022 •

edited

Loading

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

utam0k commented Oct 14, 2022

Uh oh!

Uh oh!

supervisor: Ignore the terminated signal #13790

supervisor: Ignore the terminated signal #13790

Uh oh!

Conversation

utam0k commented Oct 12, 2022

Description

Related Issue(s)

How to test

Release Notes

Documentation

Werft options:

Uh oh!

werft-gitpod-dev-com bot commented Oct 12, 2022

Uh oh!

utam0k commented Oct 12, 2022 • edited by werft-gitpod-dev-com bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iQQBot commented Oct 12, 2022 • edited by werft-gitpod-dev-com bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kylos101 commented Oct 13, 2022

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

utam0k commented Oct 13, 2022

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

utam0k commented Oct 13, 2022

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

utam0k commented Oct 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

iQQBot commented Oct 13, 2022

Uh oh!

utam0k commented Oct 14, 2022

Uh oh!

Uh oh!

utam0k commented Oct 12, 2022 •

edited by werft-gitpod-dev-com bot

Loading

iQQBot commented Oct 12, 2022 •

edited by werft-gitpod-dev-com bot

Loading

utam0k commented Oct 13, 2022 •

edited

Loading