Skip to content

There is no possibility to omit double file reading caused by checksum calculation #2968

Closed as not planned
@ThatEmbeddedGuy

Description

@ThatEmbeddedGuy

Describe the bug

Even if
Aws::S3::Model::ChecksumAlgorithm::NOT_SET

object_request.WithBucket(bucket).WithKey(s3key).WithChecksumAlgorithm(Aws::S3::Model::ChecksumAlgorithm::NOT_SET);
              object_request.SetBody(stdstream);

SDK rewrites it to MD5

Aws::String PutObjectRequest::GetChecksumAlgorithmName() const
{
  if (m_checksumAlgorithm == ChecksumAlgorithm::NOT_SET)
  {
    return "md5";
  }
  else
  {
    return ChecksumAlgorithmMapper::GetNameForChecksumAlgorithm(m_checksumAlgorithm);
  }
} 

https://github.com/aws/aws-sdk-cpp/blob/main/generated/src/aws-cpp-sdk-s3/source/model/PutObjectRequest.cpp#L329

It leads to checksum calculation which is passed in the header

content-md5: aOEJ8PQMpyoV4FzCJ4b45g==

full log from curl :

[DEBUG] CURL: (HeaderOut) PUT /XXXXXXXX HTTP/1.1
Host: XXXXX.s3.ap-southeast-1.amazonaws.com
Accept: /
amz-sdk-invocation-id: E01AF2D1-E5F1-4469-B02F-76304CFB417F
amz-sdk-request: attempt=1
authorization: AWS4-HMAC-SHA256 Credential=XXX/XXX/ap-southeast-1/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content->length;content-md5;content-type;host;x-amz-content-sha256;x-amz-date, Signature=35038b4005c0f8ec52b6084c2cd84461408ae0ed1056e4b3788543e7009e5360
content-length: 10
content-md5: aOEJ8PQMpyoV4FzCJ4b45g==
content-type: binary/octet-stream
user-agent: aws-sdk-cpp/1.9.310 Linux/5.15.0-202.135.2.el9uek.x86_64 x86_64 GCC/11.4.1
x-amz-content-sha256: UNSIGNED-PAYLOAD
x-amz-date: 20240425T125503Z
Expect: 100-continue

Headers are calculated before factual sending of file, therefore data has been read 2 times. One time for calculating checksumm and one for factual sending

Related issue was considered as bug there
#711

Expected Behavior

I should have possibility to disable twice data readings caused by checksum calculations

Current Behavior

Checksum calculation is passed in the header and is calculated before request

content-md5: aOEJ8PQMpyoV4FzCJ4b45g==

full log from curl :

[DEBUG] CURL: (HeaderOut) PUT /XXXXXXXX HTTP/1.1
Host: XXXXX.s3.ap-southeast-1.amazonaws.com
Accept: /
amz-sdk-invocation-id: E01AF2D1-E5F1-4469-B02F-76304CFB417F
amz-sdk-request: attempt=1
authorization: AWS4-HMAC-SHA256 Credential=XXX/XXX/ap-southeast-1/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content->length;content-md5;content-type;host;x-amz-content-sha256;x-amz-date, Signature=35038b4005c0f8ec52b6084c2cd84461408ae0ed1056e4b3788543e7009e5360
content-length: 10
content-md5: aOEJ8PQMpyoV4FzCJ4b45g==
content-type: binary/octet-stream
user-agent: aws-sdk-cpp/1.9.310 Linux/5.15.0-202.135.2.el9uek.x86_64 x86_64 GCC/11.4.1
x-amz-content-sha256: UNSIGNED-PAYLOAD
x-amz-date: 20240425T125503Z
Expect: 100-continue

Headers are calculated before factual sending of file, therefore data has been read 2 times. One for calculating checksumm, one for factual sending

Reproduction Steps

Standard example from aws sdk with ChecksumAlgorithm::NOT_SET

bool AwsDoc::S3::PutObject(const Aws::String &bucketName,
                           const Aws::String &fileName,
                           const Aws::Client::ClientConfiguration &clientConfig) {
    Aws::S3::S3Client s3_client(clientConfig);

object_request.WithBucket(bucket).WithKey(s3key).WithChecksumAlgorithm(Aws::S3::Model::ChecksumAlgorithm::NOT_SET);

    std::shared_ptr<Aws::IOStream> inputData =
            Aws::MakeShared<Aws::FStream>("SampleAllocationTag",
                                          fileName.c_str(),
                                          std::ios_base::in | std::ios_base::binary);

    if (!*inputData) {
        std::cerr << "Error unable to read file " << fileName << std::endl;
        return false;
    }

    request.SetBody(inputData);

    Aws::S3::Model::PutObjectOutcome outcome =
            s3_client.PutObject(request);

    if (!outcome.IsSuccess()) {
        std::cerr << "Error: PutObject: " <<
                  outcome.GetError().GetMessage() << std::endl;
    }
    else {
        std::cout << "Added object '" << fileName << "' to bucket '"
                  << bucketName << "'.";
    }

    return outcome.IsSuccess();
}

Possible Solution

Doesn't rewrite checksum algorythm as MD5 when is passed as ChecksumAlgorithm::NOT_SET

Additional Information/Context

No response

AWS CPP SDK version used

1.9.310

Compiler and Version used

GCC/11.4.1

Operating System and version

Oracle Linux 9

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.duplicateThis issue is a duplicate.p3This is a minor priority issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions