Skip to content

Allow for Replicated Snapshot, Rollback, & Restore #9926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
lucasvaltl opened this issue May 11, 2022 · 4 comments · Fixed by #11971
Closed
2 tasks done

Allow for Replicated Snapshot, Rollback, & Restore #9926

lucasvaltl opened this issue May 11, 2022 · 4 comments · Fixed by #11971
Assignees

Comments

@lucasvaltl
Copy link
Contributor

lucasvaltl commented May 11, 2022

Is your feature request related to a problem? Please describe

As a customer, I want to be able to rollback after I upgrade my gitpod installation using replicated. I also want to be able to periodically backup my application to help with disaster recovery.

Describe the behaviour you'd like

  • Fully implement Replicate's Snapshot, Rollback, & Restore feature to work with and for Gitpod

Describe alternatives you've considered

  • none

Additional context

Dependencies

@mrsimonemms
Copy link
Contributor

@lucasvaltl what's the expectation about what we backup here?

My assumption is that the in-cluster dependencies (MySQL, Minio, Registry) will be backed up and any external dependencies are excluded. My reasoning is that these resources will often have a simpler backup method and any backup we wish to provide will be:

  1. download all the data to the k8s cluster
  2. upload the data to the backup resource

This seems to be putting an unnecessary strain on the cluster.

We also don't really have much in the way of persistent volumes. We have:

  • MySQL in-cluster *
  • Minio *
  • Registry *
  • KOTS resources
  • Redis

The *ed resources will already be backed up, Redis doesn't require storage and it's arguable whether the KOTS resources need it

@lucasvaltl
Copy link
Contributor Author

lucasvaltl commented Jun 9, 2022

Thanks for raising this @mrsimonemms and being critical. I actually agree:

  • For external dependencies, we should rely on and point to the best practice backup method said external dependency
  • For in-cluster dependencies, we in general have less of a reliability expectation. These are to be used in PoCs etc, not really in a production setting.

Given that most if not all of the relevant data is stored in the above mentioned dependencies, what remains in terms of backup is the configuration of your Gitpod instance. This is actually something worth backing up (to minimise time to recovery), but for now it is not worth the cost of implementing Replicated's entire rollback/restore/snapshot feature. The added downside of doing the former is that this replicated feature assumes a full back up (and this is mentioned in the UX), and when we in reality only backup the configuration this can lead to misunderstandings with users with severe consequences.

We should still think about how we can help persist the installation configuration of Gitpod:

Given the above, I will close this ticket. In spirit, it will be replaced by #10515. Creating this piece of documentation should still allow us to help our users with their backup strategy, while not creating the impression that we are backing up everything for them and rely on existing best practices for backups of the external dependencies.

@mrsimonemms
Copy link
Contributor

mrsimonemms commented Aug 8, 2022

Reopening issue after customer requests for this feature.

@mrsimonemms
Copy link
Contributor

I've had a bit of a play with getting the in-cluster DB and storage backed up (see #12076 and #12045) but this is proving quite difficult due to using the Installer to do the deployment. Truthfully, the problem is a little nebulus and hard to explain, but I'll do my best - it's not impossible to solve, it's just that in our current implementation is would be a bit hacky.

  1. To backup a resource, it must have certain labels applied to it - easy to do with our customisation in the Installer config
  2. To backup the DB, it must have an annotation applied to it with the backup command - easy again
  3. At the point of restoration, the DB pod fails to be restored because the service account doesn't exist - as the customisation doesn't extend to service accounts, this would require the use of yq to post-process the YAML
  4. If we're post-processing the YAML, we may as well apply it to everything. Easy to do for the main resource, but certain things (ie, anything with a pod) needs it on the pod as well. This is where it starts getting a bit hacky

The problem is not that it's impossible (or even THAT hard), it's that it would introduce a lot of post-processing and customisation to the Installer job that would be hard to test and may introduce weird regressions. To my mind, the risk/cost of adding it far outweighs the benefits of having it - as this would only be available for in-cluster dependencies, any enterprise deployment will likely use cloud dependencies for their data persistence, which has more sophisticated native backup solutions.

Repository owner moved this from ⚒In Progress to ✨Done in 🚚 Security, Infrastructure, and Delivery Team (SID) Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants