Skip to content

supervisor: Ignore the terminated signal #13790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

supervisor: Ignore the terminated signal #13790

wants to merge 1 commit into from

Conversation

utam0k
Copy link
Contributor

@utam0k utam0k commented Oct 12, 2022

Description

Until now, the start-up time of a workspace is slower than the start-up of an IDE but reserved. Therefore, if you requested to exit immediately after starting the workspace such as the integration test, the IDE was sometimes not ready!

Related Issue(s)

None

How to test

Pass the integration test

Release Notes

Ignore a noisy error in the supervisor

Documentation

Werft options:

  • /werft with-local-preview
    If enabled this will build install/preview
  • /werft with-preview
  • /werft with-integration-tests=all
    Valid options are all, workspace, webapp, ide

@utam0k utam0k requested a review from a team October 12, 2022 06:48
@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-to-ch-inte.5 because the annotations in the pull request description changed
(with .werft/ from main)

@utam0k
Copy link
Contributor Author

utam0k commented Oct 12, 2022

/werft run with-large-vm with-preview with-integration-tests=workspace

👍 started the job as gitpod-build-to-ch-inte.6
(with .werft/ from main)

@iQQBot
Copy link
Contributor

iQQBot commented Oct 12, 2022

/werft run

👍 started the job as gitpod-build-to-ch-inte.7
(with .werft/ from main)

@utam0k utam0k marked this pull request as ready for review October 12, 2022 08:00
@iQQBot iQQBot self-requested a review October 12, 2022 09:50
@utam0k utam0k self-assigned this Oct 12, 2022
@kylos101
Copy link
Contributor

@iQQBot are there IDE integration tests that we could run against this, too?

@iQQBot
Copy link
Contributor

iQQBot commented Oct 13, 2022

Why could this happen?

lastStatus, err := WaitForWorkspaceStart(ctx, instanceID.String(), api, options.WaitForOpts...)
if err != nil {

we waiting the workspace to be ready, and ready means ide is ready

workspaceURL.Path += "/_supervisor/v1/status/ide"
readyURL := workspaceURL.String()
return WorkspaceReadyProbe{
Timeout: 5 * time.Second,
RetryDelay: 500 * time.Millisecond,
readyURL: readyURL,
workspaceID: workspaceID,
}

@utam0k
Copy link
Contributor Author

utam0k commented Oct 13, 2022

Why could this happen?

lastStatus, err := WaitForWorkspaceStart(ctx, instanceID.String(), api, options.WaitForOpts...)
if err != nil {

we waiting the workspace to be ready, and ready means ide is ready

workspaceURL.Path += "/_supervisor/v1/status/ide"
readyURL := workspaceURL.String()
return WorkspaceReadyProbe{
Timeout: 5 * time.Second,
RetryDelay: 500 * time.Millisecond,
readyURL: readyURL,
workspaceID: workspaceID,
}

Perhaps this is due to the IDE starting later than the time it takes to complete workspace startup.

@iQQBot
Copy link
Contributor

iQQBot commented Oct 13, 2022

Perhaps this is due to the IDE starting later than the time it takes to complete workspace startup.

This should not happen, the workspace startup complete includes IDE ready, see my code point

@iQQBot
Copy link
Contributor

iQQBot commented Oct 13, 2022

The gitpod/never-ready annotation will exist on the workspace pod until the ide ready probe returns success

@utam0k
Copy link
Contributor Author

utam0k commented Oct 13, 2022

@iQQBot Is it impossible for this code to mark the IDE as Ready?

go func() {
IDEStatus := runIDEReadinessProbe(cfg, ideConfig, ide)
ideReady.Set(true, IDEStatus)
}()

@iQQBot
Copy link
Contributor

iQQBot commented Oct 13, 2022

@iQQBot Is it impossible for this code to mark the IDE as Ready?

go func() {
IDEStatus := runIDEReadinessProbe(cfg, ideConfig, ide)
ideReady.Set(true, IDEStatus)
}()

This is the internal probe of the IDE, and only if it passes here successfully will the supervisor report the IDE ready to the outside, the supervisor does not directly change the state of any workspace. ws-manager continues to probe the supervisor's API until it reports a success before deleting the annotation.

ok, _ := s.ideReady.Get()
desktopStatus := &api.IDEStatusResponse_DesktopStatus{}
if s.desktopIdeReady != nil {
okR, i := s.desktopIdeReady.Get()
if i != nil {
desktopStatus.Link = i.Link
desktopStatus.Label = i.Label
desktopStatus.ClientID = i.ClientID
desktopStatus.Kind = i.Kind
}
ok = ok && okR
}
return &api.IDEStatusResponse{Ok: ok, Desktop: desktopStatus}, nil
}

@utam0k
Copy link
Contributor Author

utam0k commented Oct 13, 2022

@iQQBot ws-manager should only return RUNNING under the following conditions. Is there anything you can think of?

if wso.IsWorkspaceHeadless() && tpe != api.WorkspaceType_PREBUILD {
// headless workspaces (except prebuilds) don't expose a public service and thus cannot be asked about their status.
// once kubernetes reports the workspace running, so do we.
result.Phase = api.WorkspacePhase_RUNNING
return nil
}
if _, neverReady := pod.Annotations[workspaceNeverReadyAnnotation]; !neverReady {
// workspcae has been marked ready by a workspace-ready probe of the monitor
result.Phase = api.WorkspacePhase_RUNNING
return nil
}

OR
if terminationState.ExitCode != 0 && terminationState.Message != "" {
var phase *api.WorkspacePhase
if !isPodBeingDeleted(pod) {
// If the wrote a termination message and is not currently being deleted,
// then it must have been/be running. If we did not force the phase here,
// we'd be in unknown.
c := api.WorkspacePhase_RUNNING
phase = &c
}
// the container itself told us why it was terminated - use that as failure reason
return extractFailureFromLogs([]byte(terminationState.Message)), phase
} else if terminationState.Reason == "Error" {
if !isPodBeingDeleted(pod) && terminationState.ExitCode != containerKilledExitCode {
phase := api.WorkspacePhase_RUNNING
return fmt.Sprintf("container %s ran with an error: exit code %d", cs.Name, terminationState.ExitCode), &phase
}

@iQQBot
Copy link
Contributor

iQQBot commented Oct 13, 2022

ws-manager should only return RUNNING under the following conditions. Is there anything you can think of?

That's what I find strange, your test case is the regular workspace, so it's not headless and the first condition doesn't hold

And the second condition requires that the neverready annotation does not exist, and to achieve this condition, ide is must ready

@iQQBot
Copy link
Contributor

iQQBot commented Oct 13, 2022

Hi @utam0k I found the root cause, it is because

  1. ws-manager will follow redirect when it run ide ready probe
  2. integration test uses the wrong servicePrefix, so ws-proxy will redirect it to dashboard workspace not found page, and this page status code is 200... This way ws-manager will incorrectly assume that ide is ready
  3. ws-manager didn't check ide ready response

and another small thing is ws-manager request ide ready probe too frequently (every 500ms, and didn't use /wait/true API)

I fix this in #13828 this PR

@iQQBot
Copy link
Contributor

iQQBot commented Oct 13, 2022

/hold

@utam0k
Copy link
Contributor Author

utam0k commented Oct 14, 2022

Hi @utam0k I found the root cause, it is because

1. `ws-manager` will follow redirect when it run ide ready probe

2. integration test uses the wrong servicePrefix, so `ws-proxy` will redirect it to `dashboard` workspace not found page, and this page status code is 200... This way `ws-manager` will incorrectly assume that ide is ready

3. `ws-manager` didn't check ide ready response

and another small thing is ws-manager request ide ready probe too frequently (every 500ms, and didn't use /wait/true API)

I fix this in #13828 this PR

👀 Thanks @iQQBot

@utam0k utam0k closed this Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants