-
Notifications
You must be signed in to change notification settings - Fork 824
feat: Add bendsave which can backup and restore databend data #17503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Before conducting a thorough review, I have a question: |
Yes, exactly the same.
I'm trying to run a new test on a recovered metadata/data set to ensure that all the data has been restored correctly. I chose the TPC-H test for this since we are only executing queries and not inserting new data or performing DML operations, which do not need to be tested again. |
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 24 of 24 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @dantengsky, @everpcpc, and @sundy-li)
An excellent PR! Clean and concise! |
70% of me thanks you, and the remaining 30% (most on docs) comes from Claude 3.5 sonnet. |
Looks good to me. As we continue improving this feature, here are a few considerations: Given that the table data could grow quite large, we should probably give some extra thought to the efficiency of backup and restore operations. If I understand correctly, the current approach involves periodically backing up data from A to B, then restoring it to C, with B potentially storing multiple versions. Even if we avoid redundant data objects across those versions at B, this could still lead to significant additional storage costs. The efficiency of restoring from B to C might also negatively impact the RTO. Additionally, since the backed-up data already includes time travel data, this might raise questions about the necessity of multi-version backups. The vacuum operation on backups could introduce complexity in both implementation and maintenance. |
The commit switches from using epochfs to a new direct storage-to-storage backup implementation for bendsave, removing the checkpoint-based approach. The main changes: - Remove epochfs dependency and checkpoint-based backup/restore - Add direct storage copy util function - Update command line interface to remove checkpoint flag - Rename storage functions for clarity
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
All CI passed. I'm going to merge this PR now. |
…ndlabs#17503) * squash commits Signed-off-by: Xuanwo <[email protected]> * Fix force load Signed-off-by: Xuanwo <[email protected]> * backup works! Signed-off-by: Xuanwo <[email protected]> * Fully test Signed-off-by: Xuanwo <[email protected]> * Fix test Signed-off-by: Xuanwo <[email protected]> * Fix actions Signed-off-by: Xuanwo <[email protected]> * Try fix ci Signed-off-by: Xuanwo <[email protected]> * Fix insert in epochfs Signed-off-by: Xuanwo <[email protected]> * allow more time Signed-off-by: Xuanwo <[email protected]> * Add readme Signed-off-by: Xuanwo <[email protected]> * fix typo Signed-off-by: Xuanwo <[email protected]> * Update cargo.lock Signed-off-by: Xuanwo <[email protected]> * remove unneeded changes Signed-off-by: Xuanwo <[email protected]> * Add license check to backup tool bendsave * Replace epochfs with new backup implementation The commit switches from using epochfs to a new direct storage-to-storage backup implementation for bendsave, removing the checkpoint-based approach. The main changes: - Remove epochfs dependency and checkpoint-based backup/restore - Add direct storage copy util function - Update command line interface to remove checkpoint flag - Rename storage functions for clarity * Remove checkpoint flag from bendsave restore * Fix typo Signed-off-by: Xuanwo <[email protected]> * Avoid check license while restore Signed-off-by: Xuanwo <[email protected]> * Init query first Signed-off-by: Xuanwo <[email protected]> * Fix build Signed-off-by: Xuanwo <[email protected]> * Fix init query Signed-off-by: Xuanwo <[email protected]> * Fix clippy Signed-off-by: Xuanwo <[email protected]> --------- Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo [email protected]
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
This PR will add bendsave: the DR tool for databend that can backup and restore databend data.
The RFC could be seen at: https://docs.databend.com/guides/community/rfcs/disaster-recovery
This PR tested in this way:
Tests
Type of change
This change is