Skip to content

reading file twice (in PutObject) #711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
crusader-mike opened this issue Oct 30, 2017 · 9 comments
Closed

reading file twice (in PutObject) #711

crusader-mike opened this issue Oct 30, 2017 · 9 comments
Labels
bug This issue is a bug.

Comments

@crusader-mike
Copy link

Apparently AWS SDK needs to calculate some sort of signature before sending bytes over wire (in PutObject/etc) -- related stream gets read and then rewound to start and read again. Is there any way to avoid this double "read file from disk" besides using streambuf that keeps entire file in memory?

@JonathanHenson
Copy link
Contributor

JonathanHenson commented Oct 30, 2017 via email

@crusader-mike
Copy link
Author

crusader-mike commented Oct 30, 2017

Huh... I am using https and reading file (in my own implementation of streambuf) only in underflow() -- according to your criteria SDK shouldn't scan my file twice. I'll double check tonight

@crusader-mike
Copy link
Author

crusader-mike commented Oct 31, 2017

After examining my (relatively recent) SDK sources I came to a conclusion that it is impossible to switch off "payload signing" without overriding request class methods -- all requests get payload signed if <request-class>::SignBody() returns true. Similar flag passed by S3Client into aws signer class gets used only with payload-less requests -- i.e. not used at all. The only way to avoid payload being signed is to derive from requess-class, override SignBody() and return false.

I dug in this a bit more and found that Amazon S3 (as of now) allows unsigned payloads, but you can configure your bucket to deny such requests. Is it a prelude to total ban on unsigned payloads service wide?

In any case -- it would be very nice to have a switch in S3Client (or library-wide?) that would forcefully disable/enable payload signing (regardless of HTTP/HTTPS flag). Please :)

Until then I believe I have only these options:

  1. live with 'double read'
  2. load entire file into memory -- not end of the world, since I use multipart upload for large files anyway
  3. override SignBody() of each request I use and hope users don't use HTTP urls in my program

Sounds like option number 2 is the least of all evil right now.

@marcomagdy marcomagdy added bug This issue is a bug. and removed help wanted labels Oct 31, 2017
@marcomagdy
Copy link
Contributor

This is a bug. We need to fix it.

@crusader-mike
Copy link
Author

Thank you. Would be nice to be able to skip payload signature in HTTP case too...

@crusader-mike
Copy link
Author

A note about option N2 -- multipart upload is limited to 10'000 chunks. Which means for files bigger than 50GB chunk size will start growing and at some point they won't fit into memory (especially if you upload them in parallel). I.e. you'll have to limit memory buffer and you'll end up reading entire file twice.

@wps132230
Copy link
Contributor

wps132230 commented Nov 9, 2017

Hi, you can avoid reading file twice by skipping payload signature with the latest update(ver 1.3.1) now:

auto s3Client = Aws::MakeShared<S3Client>(ALLOCATION_TAG, credentialsProvider, config, AWSAuthV4Signer::PayloadSigningPolicy::Never /*signPayloads*/, true /*useVirtualAddressing*/);

in which AWSAuthV4Signer::PayloadSigningPolicy::Never will overrule SignBody(), even though it's true, then the payload will never be signed.

Please consult this commit for more details: b9ccf4d

I am closing this issue, if you encounter any further problem, please reopen it.

@singku singku closed this as completed Nov 10, 2017
@crusader-mike
Copy link
Author

Thanks, Pushen!
image

aws-sdk-cpp-automation pushed a commit that referenced this issue Oct 15, 2020
Bumps [junit](https://github.com/junit-team/junit4) from 4.12 to 4.13.1.
- [Release notes](https://github.com/junit-team/junit4/releases)
- [Changelog](https://github.com/junit-team/junit4/blob/main/doc/ReleaseNotes4.12.md)
- [Commits](junit-team/junit4@r4.12...r4.13.1)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@crusader-mike
Copy link
Author

crusader-mike commented Jun 17, 2021

@singku @wps132230 I've discovered that with HTTP endpoint my source files/objects are still being read twice.

Culprit is check for request.GetUri().GetScheme() != Http::Scheme::HTTPS check at this line.

Can someone explain why SDK insists on calculating payload hash on non-HTTPS endpoints even if I explicitly asked it not to "sign body"?

Is there an easy way to switch it off without substituting request signer object?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
Development

No branches or pull requests

5 participants