Skip to content

[content-service] Add support to use existing S3 bucket #10073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 25, 2022
Merged

Conversation

aledbf
Copy link
Member

@aledbf aledbf commented May 17, 2022

Description

This PR introduces a new "single bucket mode" to the S3 storage implementation.

In some scenarios, we could not have enough permissions to create buckets. In those cases the "one-bucket per user" implementation fails.

If there's a bucket name configured (see installer/content-service config change), that bucket name is used and the user ID is prefixed to the object name. If no such bucket name is configured, the old "one bucket per user" behaviour kicks in. This way, users can maintain backwards compatibility for now. There is no automatic migration or "hybrid mode" where we fall back to the old behaviour. We don't have enough installations in the wild which were able to actually use the old behaviour to justify introducing such a mode.

We maintain the old mode in GCP (and maybe even S3) for now because

  • it maintains backwards compatibility
  • it allows for region-local optimisations. That said, we could just as well introduce single regional buckets instead.
  • moving away from it would require migration for gitpod.io

How to test

Precursor: ensure the installation uses minio/S3, and that the bucket is configured

  • start/stop a workspace, make sure the backup happens and is restored
  • take a snapshot and open a workspace from it
  • run a prebuild and open a workspace from it
  • use the VS Code settings sync, make sure it still syncs
  • check out prebuild logs in the dashboard

Release Notes

[content-service] Add support to use a single S3 bucket

Fixes #10070

@aledbf
Copy link
Member Author

aledbf commented May 17, 2022

/werft run

👍 started the job as gitpod-build-aledbf-bucket.7
(with .werft/ from main)

@aledbf
Copy link
Member Author

aledbf commented May 17, 2022

Configuration example:

objectStorage:
  inCluster: false
  kind: minio
  s3:
    endpoint: minio.default.svc.cluster.local:9000
    accessKey: GlrlMt5e40dnW0GhveAX
    secretKey: 0L6XW0.GWcD8CBMseV31
    
    bucket: some-valid-bucket

@aledbf aledbf marked this pull request as ready for review May 17, 2022 23:47
@aledbf aledbf requested review from a team May 17, 2022 23:47
@aledbf aledbf requested review from csweichel and geropl as code owners May 17, 2022 23:47
@github-actions github-actions bot added team: IDE team: delivery Issue belongs to the self-hosted team team: workspace Issue belongs to the Workspace team labels May 17, 2022
Copy link
Contributor

@corneliusludmann corneliusludmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome to see this change! 🚀

Added a few comments regarding backward compatibility. 🤔

@corneliusludmann
Copy link
Contributor

Follow-up issue: Add config for S3 bucket to KOTS (Replicated) installer.

@corneliusludmann
Copy link
Contributor

Follow-up issue: Document S3 bucket config in self-hosted docs

}

func (rs *DirectMinIOStorage) objectName(name string) string {
return minioWorkspaceBackupObjectName(rs.WorkspaceName, name)
return minioWorkspaceBackupObjectName(rs.Username, rs.WorkspaceName, name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return minioWorkspaceBackupObjectName(rs.Username, rs.WorkspaceName, name)
var username string
if rs.MinIOConfig.BucketName != "" {
username = rs.Username
}
return minioWorkspaceBackupObjectName(username, rs.WorkspaceName, name)

that way if bucket name is set, then we default to new way of doing things, otherwise we maintain backwards compatibility to old way of accessing buckets.

Copy link
Contributor

@sagor999 sagor999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add unit tests that would ensure old code is working correctly, and new one also generates correct bucket and path.

@sagor999
Copy link
Contributor

@corneliusludmann @aledbf I added tests and fixed directMinio and presignedMinio access.
Tested in preview env, and made sure that now all content indeed is being written into dedicated bucket if it was set in installer config.

Please double check tests that I added to ensure they are correct.

This also ensures backward compatibility (rolling out this update should not break any existing self hosted installations).
But this feature should cannot be enabled on existing installations without losing all data or manually moving old buckets into new bucket.

Copy link
Contributor

@sagor999 sagor999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with tests it looks good now (feels weird approving my own changes to PR 😃 )

Copy link
Contributor

@corneliusludmann corneliusludmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks, @aledbf and @sagor999! 🚀

@@ -105,6 +105,8 @@ type MinIOConfig struct {

Region string `json:"region"`
ParallelUpload uint `json:"parallelUpload,omitempty"`

BucketName string `json:"bucket"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BucketName string `json:"bucket"`
BucketName string `json:"bucket,omitempty"`

Do we need omitempty here in case the value is not given or for backward compatibility reasons?

I don't think so but wanted to raise this question here just in case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corneliusludmann good catch! Since that param is optional, I added omitempty to it.

Copy link
Member

@akosyakov akosyakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@mustard-mh mustard-mh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in supervisor mod 🙈

@roboquat roboquat merged commit b053dae into main May 25, 2022
@roboquat roboquat deleted the aledbf/bucket branch May 25, 2022 07:48
@roboquat roboquat added the deployed: IDE IDE change is running in production label May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: IDE IDE change is running in production release-note size/XL team: delivery Issue belongs to the self-hosted team team: IDE team: workspace Issue belongs to the Workspace team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use a single object storage bucket per Gitpod installation
6 participants