reading file twice (in PutObject) #711

crusader-mike · 2017-10-30T16:26:56Z

Apparently AWS SDK needs to calculate some sort of signature before sending bytes over wire (in PutObject/etc) -- related stream gets read and then rewound to start and read again. Is there any way to avoid this double "read file from disk" besides using streambuf that keeps entire file in memory?

JonathanHenson · 2017-10-30T16:29:31Z

The default ctor args for s3 turn the payload hashing off of you are using tls. Also, setting the content length field will prevent the seek operations to figure it out. By controlling both of these settings, the file will only be read once.

…

Sent from my iPhone

On Oct 30, 2017, at 9:27 AM, crusader-mike ***@***.***> wrote: Apparently AWS SDK needs to calculate some sort of signature before sending bytes over wire (in PutObject/etc) -- related stream gets read and then rewound to start and read again. Is there any way to avoid this double "read file from disk" besides using streambuf that keeps entire file in memory? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

crusader-mike · 2017-10-30T16:59:29Z

Huh... I am using https and reading file (in my own implementation of streambuf) only in underflow() -- according to your criteria SDK shouldn't scan my file twice. I'll double check tonight

crusader-mike · 2017-10-31T00:36:45Z

After examining my (relatively recent) SDK sources I came to a conclusion that it is impossible to switch off "payload signing" without overriding request class methods -- all requests get payload signed if <request-class>::SignBody() returns true. Similar flag passed by S3Client into aws signer class gets used only with payload-less requests -- i.e. not used at all. The only way to avoid payload being signed is to derive from requess-class, override SignBody() and return false.

I dug in this a bit more and found that Amazon S3 (as of now) allows unsigned payloads, but you can configure your bucket to deny such requests. Is it a prelude to total ban on unsigned payloads service wide?

In any case -- it would be very nice to have a switch in S3Client (or library-wide?) that would forcefully disable/enable payload signing (regardless of HTTP/HTTPS flag). Please :)

Until then I believe I have only these options:

live with 'double read'
load entire file into memory -- not end of the world, since I use multipart upload for large files anyway
override SignBody() of each request I use and hope users don't use HTTP urls in my program

Sounds like option number 2 is the least of all evil right now.

marcomagdy · 2017-10-31T18:40:27Z

This is a bug. We need to fix it.

crusader-mike · 2017-11-01T01:13:09Z

Thank you. Would be nice to be able to skip payload signature in HTTP case too...

crusader-mike · 2017-11-02T07:28:14Z

A note about option N2 -- multipart upload is limited to 10'000 chunks. Which means for files bigger than 50GB chunk size will start growing and at some point they won't fit into memory (especially if you upload them in parallel). I.e. you'll have to limit memory buffer and you'll end up reading entire file twice.

wps132230 · 2017-11-09T23:52:26Z

Hi, you can avoid reading file twice by skipping payload signature with the latest update(ver 1.3.1) now:

auto s3Client = Aws::MakeShared<S3Client>(ALLOCATION_TAG, credentialsProvider, config, AWSAuthV4Signer::PayloadSigningPolicy::Never /*signPayloads*/, true /*useVirtualAddressing*/);

in which AWSAuthV4Signer::PayloadSigningPolicy::Never will overrule SignBody(), even though it's true, then the payload will never be signed.

Please consult this commit for more details: b9ccf4d

I am closing this issue, if you encounter any further problem, please reopen it.

crusader-mike · 2017-11-10T03:26:16Z

Thanks, Pushen!

Bumps [junit](https://github.com/junit-team/junit4) from 4.12 to 4.13.1. - [Release notes](https://github.com/junit-team/junit4/releases) - [Changelog](https://github.com/junit-team/junit4/blob/main/doc/ReleaseNotes4.12.md) - [Commits](junit-team/junit4@r4.12...r4.13.1) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

crusader-mike · 2021-06-17T23:14:06Z

@singku @wps132230 I've discovered that with HTTP endpoint my source files/objects are still being read twice.

Culprit is check for request.GetUri().GetScheme() != Http::Scheme::HTTPS check at this line.

Can someone explain why SDK insists on calculating payload hash on non-HTTPS endpoints even if I explicitly asked it not to "sign body"?

Is there an easy way to switch it off without substituting request signer object?

singku added the help wanted label Oct 31, 2017

marcomagdy added bug This issue is a bug. and removed help wanted labels Oct 31, 2017

marcomagdy mentioned this issue Oct 31, 2017

s3 low download speed after migrating to ubuntu 14.04 #694

Closed

crusader-mike mentioned this issue Nov 1, 2017

what if file size changed between Content-Length calculation and sending data out? #712

Closed

singku closed this as completed Nov 10, 2017

crusader-mike mentioned this issue Jun 18, 2021

PutObject reads underlying stream twice (even with PayloadSigningPolicy::Never) #1688

Closed

2 tasks

ThatEmbeddedGuy mentioned this issue May 23, 2024

There is no possibility to omit double file reading caused by checksum calculation #2968

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

reading file twice (in PutObject) #711

reading file twice (in PutObject) #711

crusader-mike commented Oct 30, 2017

JonathanHenson commented Oct 30, 2017 via email

Uh oh!

crusader-mike commented Oct 30, 2017 •

edited

Loading

Uh oh!

crusader-mike commented Oct 31, 2017 •

edited

Loading

Uh oh!

marcomagdy commented Oct 31, 2017

Uh oh!

crusader-mike commented Nov 1, 2017

Uh oh!

crusader-mike commented Nov 2, 2017

Uh oh!

wps132230 commented Nov 9, 2017 •

edited by singku

Loading

Uh oh!

crusader-mike commented Nov 10, 2017

Uh oh!

crusader-mike commented Jun 17, 2021 •

edited

Loading

Uh oh!

reading file twice (in PutObject) #711

reading file twice (in PutObject) #711

Comments

crusader-mike commented Oct 30, 2017

JonathanHenson commented Oct 30, 2017 via email

Uh oh!

crusader-mike commented Oct 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crusader-mike commented Oct 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcomagdy commented Oct 31, 2017

Uh oh!

crusader-mike commented Nov 1, 2017

Uh oh!

crusader-mike commented Nov 2, 2017

Uh oh!

wps132230 commented Nov 9, 2017 • edited by singku Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crusader-mike commented Nov 10, 2017

Uh oh!

crusader-mike commented Jun 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crusader-mike commented Oct 30, 2017 •

edited

Loading

crusader-mike commented Oct 31, 2017 •

edited

Loading

wps132230 commented Nov 9, 2017 •

edited by singku

Loading

crusader-mike commented Jun 17, 2021 •

edited

Loading