-
Notifications
You must be signed in to change notification settings - Fork 910
Add LegacyMd5Plugin for MD5 checksum calculations in S3 operations requiring checksums #6055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
32af381
to
aae6a51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
architecture-tests
fails because HttpChecksumUtils
(intenal API in sdk-core) is used in another package (s3). We should either mark this utils class protected api or update the archunit store.
services/s3/src/main/java/software/amazon/awssdk/services/s3/LegacyMd5Plugin.java
Show resolved
Hide resolved
@Override | ||
public void configureClient(SdkServiceClientConfiguration.Builder config) { | ||
S3ServiceClientConfiguration.Builder s3Config = (S3ServiceClientConfiguration.Builder) config; | ||
s3Config.responseChecksumValidation(ResponseChecksumValidation.WHEN_REQUIRED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not set it if responseChecksumValidation is already configured? Same for RequestChecksumCalculation (I missed it when I wrote the sample code😅)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you you please elaborate , the default values of RequestChecksumCalculation and ResponseChecksumValidation is WHEN_SUPPORTED
thus this values will always have some values.
Can you please help me with a Scenario where this additional check might be helpful ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I believe we should not be doing:
s3Config.responseChecksumValidation(ResponseChecksumValidation.WHEN_REQUIRED);
s3Config.requestChecksumCalculation(RequestChecksumCalculation.WHEN_REQUIRED);
This is to maintain the single responsibility of the LegacyMd5Plugin, which is solely added to add the MD5 header.
The responseChecksumValidation and requestChecksumCalculation should instead be set on the client or in the environment variables. What do you think?
* | ||
* <p>Use this plugin only when you need to maintain compatibility with applications that depend on the | ||
* legacy MD5 checksum behavior, particularly for operations that previously calculated MD5 checksums | ||
* automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add sample code of how customers can configure it on the client with @snippet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
1a2bace
to
a7587ea
Compare
…egacyMd5Plugin.java Co-authored-by: Olivier L Applin <[email protected]>
…java-v2 into joviegas/legacy_md5_pluggin
Done , removed the HttpChecksumUtils apiand made a inline. |
Done , I removed the HttpChecksumUtils api and made a inline. Lets see if it passes. |
* | ||
* <p>This plugin configures the S3 client to: | ||
* <ul> | ||
* <li>Set request checksum calculation to WHEN_REQUIRED mode, which calculates default checksums only when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we need to update this documentation since we removed configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* | ||
* // For asynchronous S3 client | ||
* S3AsyncClient asyncClient = S3AsyncClient.builder() | ||
* .addPlugin(create()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LegacyMd5Plugin.create()
. Can we configure the request checksum calculation and response checksum validation to when_required in the code snippet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test cases with the plugin when user manually set md5 on the request body?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test case both for putObject and deleteObjects
|
thank you for this! have you tried testing it against a third party store? google cloud is easy to set up & documented in the s3a thirdparty and qualification docs. Even if you only run it weekly, it'd restore my confidence in the SDK changes. thanks |
HI Steve ,
|
Has somebody already successfully tested this change against GCP? I have enabled the new plugin, but You can also see some sample code in the related issue: #5987 |
Hi @uwolfer , Also along with the pluggin could you please make sure ChecksumCalculation option are set as below
If issue stil persist can you please check wire logs and check good and bad case Wire logs ? |
@joviegas Is it really the idea to disable optional checksum checks? If I remember correctly, it did even work without adding the new
Here you can find a simple code snippet to test it against GCP (just insert the access key id and secret and pass some random content): #5987 (comment) |
Motivation and Context
#5802 (comment)
The recent S3 changes introduced challenges for customers using S3A (Apache Spark, Iceberg) with various third-party storage providers.
While some providers like MinIO have fixes available, many customers use different storage solutions ( or cannot easily upgrade their third-party stores.
This plugin provides a backwards-compatible solution allowing customers to maintain MD5 checksum calculations where required while adopting S3 Motorcade features.
Note: This Plugin adds MD5 checksum for operations which require checksums along with the default checksum provided by SDK for the operation which require checksum .
If you want to add MD5 checksums to the operations that require checksums and want to skip adding of SDK Default checksums for operations that support checksums but not required, then you can enable ClientBuilder options requestChecksumCalculation and responseChecksumValidation as WHEN_REQUIRED, this will add SDK default checksums only to operation that required checksums
Modifications
Testing
Junits
One-off integs
A Test class was created with following APIs
Tested with
minio version RELEASE.2025-01-18T00-31-37Z (commit-id=4b6eadbd80313711c01039bfa9a05167291a8c50)
where the issue existed and all test passed , below config were tested** With LegacyMd5Plugin **
** With LegacyMd5Plugin with ChecksumCalculation and ChecksumValidation as when required**
** Also tested the above test cases and configutation with a real bucket in S3 and test passed **
Screenshots (if appropriate)
Types of changes
License