Skip to content

Solve "cannot download OTS" - by removing OTS for secrets #11112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 6, 2022

Conversation

csweichel
Copy link
Contributor

@csweichel csweichel commented Jul 4, 2022

Description

This PR works towards removing OTS from Gitpod. To this end it makes ws-manager use Kubernetes secrets to store sensitive workspace data (e.g. Git or Gitpod tokens, but also user env vars).

Note This change is feature flagged and fully backwards compatible. The order of deployment does not matter, because the new secrets behaviour is transparent to users of the ws-manager API. We are not logging the event trace log by default and hence don't push secrets into our logs, once OTS isn't used anymore but secrets aren't used yet.

This PR touches on two teams:

  1. workspace: with the addition of the new feature-flagged protected secrets behaviour, where ws-manager creates one secret per workspace, and uses this secret to feed the pod environment variables as well as the InitializeWorkspace call. Relevant commits are 4ca3b87, 69c5e82 and 53a064c.
  2. webapp: with the removal of the OTS use for env vars, Git and Gitpod token. This change is backwards compatible and can be deployed independently. Relevant commit is 0833bd4

fixes #10134

How it works

When the protected_secrets workspace feature flag is enabled, ws-manager creates a secret with the same name as workspace pod. In this secret are the user provided environment variables (this includes team env vars), as well as secrets extracted from the content initializer.

For the former (user provided env vars), the env var entries in the secret use the sha256 hash of the env var name, because we have no guarantee that the env var name is actually a valid secret data name.

For the latter, this PR contains code which can extract and re-inject secrets from/to a content initializer. There are unit tests for this behaviour.

The secret is removed once the pod reaches the running phase, but latest when it's stopping or stopped. Once the pod is running, there is no reason to keep the secret around. The secret also isn't part of the workspaceObjects to avoid dumping it in the event trace log.

How to test

  1. Enable the protected_secrets feature flag for your user in the admin UI.
  2. Add an environment variable for your user. Best use */* as repository pattern to make sure you get that env var when starting a workspace.
  3. Open a workspace (best on a private repo) and notice that it starts
  4. Inspect the Kubernetes objects

    Note because the secret is removed once the pod is running, you'll want to be quick to inspect it

    # list all workspace secrets
    $ kubectl get secret -l component=workspace -o yaml
    apiVersion: v1
    items:
    - apiVersion: v1
      data:
        2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae: YmFy
        3a3b612491764cccf7a5bccafc73031bb965cad1592bf600f2c3edca7c6e41bb: UFJPR1JFU1MgSVMgQk9STiBPRiBET1VCVCBBTkQgSU5RVUlSWS4K
        6271376496bf05a98eb6f9231ef39c87d25a936de821916483e5755bf4dfbe1a: aHR0cHM6Ly9vcGVuLXZzeC5jdy13c21hbi1zZWNyZXRzLnByZXZpZXcuZ2l0cG9kLWRldi5jb20=
        initializer.git: Q1VSSU9TSVRZIElTIFRIRSBXSUNLIElOIFRIRSBDQU5ETEUgT0YgTEVBUk5JTkcuCg==
      kind: Secret
      metadata:
        creationTimestamp: "2022-07-04T16:56:45Z"
        labels:
          app: gitpod
          component: workspace
          gitpod.io/workspaceClass: default
          gpwsman: "true"
          headless: "false"
          metaID: gitpodio-templatetypesc-ej938iqktpk
          owner: 47e7ada0-1f00-4a58-aefb-5ecfcdc988fb
          project: ""
          team: ""
          workspaceID: 10a0d3a8-9d32-48aa-9994-559ed7f8dd17
          workspaceType: regular
        name: ws-10a0d3a8-9d32-48aa-9994-559ed7f8dd17
        namespace: default
        resourceVersion: "65835"
        uid: 3a036038-4a4a-415b-be27-e19a6e77d64a
      type: Opaque
    kind: List
    metadata:
      resourceVersion: ""
    # inspect the initializer annotation of a workspace pod which uses secrets
    $ cd components/ws-manager
    $ go run main.go debug decode-initalizer $(kubectl get pod -o yaml -l component=workspace | grep gitpod/contentInitializer | cut -d ':' -f 2 )
    {
      "git": {
        "remoteUri": "https://github.com/gitpod-io/template-typescript-node.git",
        "upstreamRemoteUri": "https://github.com/microsoft/TypeScript-Node-Starter.git",
        "targetMode": "REMOTE_BRANCH",
        "cloneTaget": "master",
        "checkoutLocation": "template-typescript-node",
        "config": {
          "authentication": "BASIC_AUTH",
          "authUser": "oauth2",
          "authPassword": "extracted-secret/initializer.git"
        }
      }
    }
    # inspect the workspace container's environment variables
    $ kubectl get pod -l component=workspace -o json | jq '.items[0].spec.containers[0].env[] | select(.valueFrom)'
    {
      "name": "foo",
      "valueFrom": {
        "secretKeyRef": {
          "key": "2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae",
          "name": "ws-10a0d3a8-9d32-48aa-9994-559ed7f8dd17"
        }
      }
    }
    {
      "name": "VSX_REGISTRY_URL",
      "valueFrom": {
        "secretKeyRef": {
          "key": "6271376496bf05a98eb6f9231ef39c87d25a936de821916483e5755bf4dfbe1a",
          "name": "ws-10a0d3a8-9d32-48aa-9994-559ed7f8dd17"
        }
      }
    }
    {
      "name": "THEIA_SUPERVISOR_TOKENS",
      "valueFrom": {
        "secretKeyRef": {
          "key": "3a3b612491764cccf7a5bccafc73031bb965cad1592bf600f2c3edca7c6e41bb",
          "name": "ws-10a0d3a8-9d32-48aa-9994-559ed7f8dd17"
        }
      }
    }

FAQ

  • Why is @corneliusludmann a requested reviewer:
    Because of the content-service-api change and the outdated CODEOWNERS file
  • Can this be split into multiple PRs?:
    Yes, but then it is harder to try out. I hope that the commits are reviewable by themselves.
  • Why doesn't this PR remove all the OTS code in server?
    Because it's still used for the additional content, e.g. for server generated .gitpod.yml files. That's for another day.

Release Notes

NONE

Werft options:

  • /werft with-preview

@csweichel
Copy link
Contributor Author

csweichel commented Jul 4, 2022

/werft run

👍 started the job as gitpod-build-cw-wsman-secrets.2
(with .werft/ from main)

@csweichel csweichel force-pushed the cw/wsman-secrets branch 8 times, most recently from 3d89372 to 25ea837 Compare July 4, 2022 15:14
@csweichel csweichel marked this pull request as ready for review July 4, 2022 17:13
@csweichel csweichel requested a review from a team July 4, 2022 17:13
@csweichel csweichel requested a review from a team July 4, 2022 17:13
@github-actions github-actions bot added team: webapp Issue belongs to the WebApp team team: workspace Issue belongs to the Workspace team labels Jul 4, 2022
@csweichel csweichel changed the title [will-be-a-pr-series] Add secrets to ws-manager Solve "cannot download OTS" Jul 5, 2022
@csweichel csweichel changed the title Solve "cannot download OTS" Solve "cannot download OTS" - by removing OTS for secrets Jul 5, 2022
@aledbf
Copy link
Member

aledbf commented Jul 5, 2022

@csweichel just one question, do we have limits for the length of the secrets?
(I am thinking of the 1MB limit for secrets and configmaps)

@csweichel
Copy link
Contributor Author

@csweichel just one question, do we have limits for the length of the secrets? (I am thinking of the 1MB limit for secrets and configmaps)

We do - env vars are limited to 32767 bytes. There is no limit on the total number of env vars though. I doubt this is a problem in practice, but will add validation on the ws-manager side.

to introduce a benign failure mode compared to the
Kubernetes provided message.
Copy link
Contributor

@corneliusludmann corneliusludmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving in case it helps to unblock the PR. Haven't looked at the PR in its entirety but changes in the content-service-api are looking fabulous 😍 !

@csweichel
Copy link
Contributor Author

csweichel commented Jul 5, 2022

/werft run

👍 started the job as gitpod-build-cw-wsman-secrets.14
(with .werft/ from main)

@sagor999
Copy link
Contributor

sagor999 commented Jul 5, 2022

Enable the protected_secrets feature flag for your user in the admin UI.

@csweichel can you please add admin flag to my user? I cannot enable it myself as by default you are the only admin.

@@ -577,6 +590,28 @@ func (m *Manager) createDefiniteWorkspacePod(startContext *startWorkspaceContext
gitpodGUID := int64(133332)
pod.Spec.SecurityContext.FSGroup = &gitpodGUID

case api.WorkspaceFeatureFlag_PROTECTED_SECRETS:
for _, c := range pod.Spec.Containers {
if c.Name != "workspace" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this check necessary? this implies we have more then one container in the spec. Do we now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right - today we don't have multiple container, and it's not that likely we will in the future (we have had multiple containers in the past).

I'd be ok with removing the check, and equally with leaving it in.
IMHO it's a tad more complete, because the env vars of other containers (should we have them) would likely not be part of the secret.

@geropl
Copy link
Member

geropl commented Jul 6, 2022

The secret is removed once the pod reaches the running phase, but latest when it's stopping or stopped. Once the pod is running, there is no reason to keep the secret around.

I bet you thought about it, but: Are there any intermittent states (e.g., unknown) where this makes a difference?
And: what's the motivation for removing it so early? Improving performance for "StopWorkspace"? 🤔

Copy link
Member

@geropl geropl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebApp code LGTM, tested and works. ✔️

@roboquat roboquat merged commit 04b9a64 into main Jul 6, 2022
@roboquat roboquat deleted the cw/wsman-secrets branch July 6, 2022 13:52
@roboquat roboquat added the deployed: workspace Workspace team change is running in production label Jul 6, 2022
@kylos101
Copy link
Contributor

kylos101 commented Jul 6, 2022

@csweichel merging this closed the epic, was that intentional? I ask because my understanding is that protected_secrets is disabled by default. In other words, I'm thinking we should leave the epic open, test the feature internally, then with a subset of users, and eventually make it a default.

@csweichel
Copy link
Contributor Author

csweichel commented Jul 7, 2022

I bet you thought about it, but: Are there any intermittent states (e.g., unknown) where this makes a difference?

Indeed - I don't believe such intermittent state exists because:

  • the secret gets removed once the pod is running. At this point the env vars are populated and we don't expect the pod to restart (we delete it before that). If we wanted to support pod restarts, we'd need to keep the secret around.
  • any error state leads to pod deletion.

And: what's the motivation for removing it so early? Improving performance for "StopWorkspace"? 🤔

Mainly security, i.e. keeping that data around only when it's needed. One can still go to the pod and look at the env vars, but at least it's gone from the Kubernetes API server.

@csweichel
Copy link
Contributor Author

@csweichel merging this closed the epic, was that intentional? I ask because my understanding is that protected_secrets is disabled by default. In other words, I'm thinking we should leave the epic open, test the feature internally, then with a subset of users, and eventually make it a default.

That was not intentional - my bad for adding it as "fixes".
I've reopened it. We should plan how we make this the default.

@roboquat roboquat added deployed: webapp Meta team change is running in production deployed Change is completely running in production labels Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: webapp Meta team change is running in production deployed: workspace Workspace team change is running in production deployed Change is completely running in production release-note-none size/XXL team: webapp Issue belongs to the WebApp team team: workspace Issue belongs to the Workspace team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Epic: Get rid of OTS (One-Time Secret)
7 participants