Skip to content

feat: add ChecksumAlgorithm option to decides which algorithm calculate checksums. #2197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

LiuQhahah
Copy link
Contributor

Description

add ChecksumAlgorithm option to decides which algorithm calculate checksums.
Checklist

  • Code compiles correctly and linting passes locally

Instructions

  • The PR title should follow the Conventional Commits
    syntax, leading with fix:, feat:, chore:, ci:, etc.
  • The description should briefly explain what the PR is about. In the case of a bugfix, describe or
    link to the bug.
  • In the checklist section, check the boxes in that are applicable, using [x] syntax.
    • If not applicable, remove the entire line. Only leave the box unchecked if you intend to come
      back and check the box later.
  • Delete the Instructions line and everything below it, to indicate you have read and are
    following these instructions. 🙂

Thank you for your contribution to Badger!

@LiuQhahah LiuQhahah requested a review from a team as a code owner May 8, 2025 09:56
@mangalaman93
Copy link
Member

@LiuQhahah Thank you for the PR. Could you help us understand the rationale behind the PR? I will review the PR as soon as I can. Thanks

@LiuQhahah
Copy link
Contributor Author

Hi @mangalaman93
I saw in the code that there was a TODO tag and looked at this feature that hadn't been implemented yet, so I decided to implement it.
Below is the related code : https://github.com/hypermodeinc/badger/blob/main/table/builder.go#L457

Thanks
Qiang

@mangalaman93 mangalaman93 enabled auto-merge (squash) May 11, 2025 20:39
auto-merge was automatically disabled May 12, 2025 03:05

Head branch was pushed to by a user without write access

Copy link
Member

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiuQhahah There is one issue I see here. That is, if I update the checksum algorithm on an existing badger directory, it is going to fail in that case. Can we handle that case so that for existing data we use the right checksum algorithm whereas for all the new data we use the new checksum algorithm?

@@ -515,6 +529,19 @@ func pluralFiles(count int) string {
return "files"
}

// When the checkSum Algorithm is invalid, func strToChecksumAlgorithm will return the default checkSum Algorithm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is benefit of returning the default checksum when we are returning the error along with it? I think we should return default checksum with nil error

@LiuQhahah
Copy link
Contributor Author

@LiuQhahah There is one issue I see here. That is, if I update the checksum algorithm on an existing badger directory, it is going to fail in that case. Can we handle that case so that for existing data we use the right checksum algorithm whereas for all the new data we use the new checksum algorithm?

Following the current design, I'm unsure about how to proceed.

Currently, the options only provide a single value for checksumAlgorithm. If we want to support multiple checksum algorithms, it might mislead users since they wouldn’t know which algorithm was actually used.

I suggest sticking with the current design: one Badger directory should use only one checksum algorithm. If users need multi-checksum support, we can discuss it separately.

What do you think?

Thanks
Qiang

@mangalaman93
Copy link
Member

@LiuQhahah There is one issue I see here. That is, if I update the checksum algorithm on an existing badger directory, it is going to fail in that case. Can we handle that case so that for existing data we use the right checksum algorithm whereas for all the new data we use the new checksum algorithm?

Following the current design, I'm unsure about how to proceed.

Currently, the options only provide a single value for checksumAlgorithm. If we want to support multiple checksum algorithms, it might mislead users since they wouldn’t know which algorithm was actually used.

I suggest sticking with the current design: one Badger directory should use only one checksum algorithm. If users need multi-checksum support, we can discuss it separately.

What do you think?

Thanks Qiang

Seems like we already store checksum algorithm in the block. We should use that when reading, whereas when writing, we should use the checksum algorithm provided in the option.

We should also add a test for this scenario where we open badger with a algo1 for checksum, write and read some data, close badger. Open again with algo2 for checksum, read the old data, write some more new data and read new data. This ensures that we are using the right checksum algorithm.

@LiuQhahah
Copy link
Contributor Author

@LiuQhahah There is one issue I see here. That is, if I update the checksum algorithm on an existing badger directory, it is going to fail in that case. Can we handle that case so that for existing data we use the right checksum algorithm whereas for all the new data we use the new checksum algorithm?

Following the current design, I'm unsure about how to proceed.
Currently, the options only provide a single value for checksumAlgorithm. If we want to support multiple checksum algorithms, it might mislead users since they wouldn’t know which algorithm was actually used.
I suggest sticking with the current design: one Badger directory should use only one checksum algorithm. If users need multi-checksum support, we can discuss it separately.
What do you think?
Thanks Qiang

Seems like we already store checksum algorithm in the block. We should use that when reading, whereas when writing, we should use the checksum algorithm provided in the option.

We should also add a test for this scenario where we open badger with a algo1 for checksum, write and read some data, close badger. Open again with algo2 for checksum, read the old data, write some more new data and read new data. This ensures that we are using the right checksum algorithm.

Thank you for your information.
I agree with you.

I will follow this task.
Thanks,
Qiang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants