Skip to content

Commit eaad48f

Browse files
joe maleyJoe Maley
joe maley
and
Joe Maley
authored
Fix read corruption in S3 (#2253)
This fixes a rare read corruption that may manifest with multiple different error messages. One example: `[TileDB::Buffer] Error: Read failed; Trying to read beyond buffer size` The issue is that the `Aws::IOStream` constructor that we use within the callback set in `GetObjectRequest::SetResponseStreamFactory()` accepts a pointer to a `boost::interprocess::bufferbuf`. We currently allocate one of these `bufferbuf` instances on the heap and free it when we return from `S3::read`. Experimentally, I have determined that this can cause a corruption to the buffer we stream into. We are not responsible for what the AWS SDK does with objects created from the callback we assigned in `GetObjectRequest::SetResponseStreamFactory()`. This patch introduces a small wrapper so that the `Aws::IOStream` manages the lifetime of the `boost::interprocess::bufferbuf`. I also noticed that we use an SDK version with a `Aws::Utils::Stream::PreallocatedStreamBuf` that we can use instead of the boost `bufferbuf`. Relevant discussion on stack overflow: aws/aws-sdk-cpp#1430 --- TYPE: BUG DESC: Fix rare read corruption in S3 Co-authored-by: Joe Maley <[email protected]>
1 parent c65b492 commit eaad48f

File tree

2 files changed

+35
-10
lines changed

2 files changed

+35
-10
lines changed

tiledb/sm/filesystem/s3.cc

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -762,16 +762,14 @@ Status S3::read(
762762
("bytes=" + std::to_string(offset) + "-" +
763763
std::to_string(offset + length + read_ahead_length - 1))
764764
.c_str());
765-
// Create a unique_ptr so that this will be freed at the end of the function
766-
// call This only needs to live long enough for the request itself
767-
auto streamBuf = tdb_unique_ptr<boost::interprocess::bufferbuf>(tdb_new(
768-
boost::interprocess::bufferbuf,
769-
(char*)buffer,
770-
length + read_ahead_length));
771-
get_object_request.SetResponseStreamFactory([&streamBuf]() {
772-
return Aws::New<Aws::IOStream>(
773-
constants::s3_allocation_tag.c_str(), streamBuf.get());
774-
});
765+
get_object_request.SetResponseStreamFactory(
766+
[buffer, length, read_ahead_length]() {
767+
return Aws::New<PreallocatedIOStream>(
768+
constants::s3_allocation_tag.c_str(),
769+
buffer,
770+
length + read_ahead_length);
771+
});
772+
775773
if (request_payer_ != Aws::S3::Model::RequestPayer::NOT_SET)
776774
get_object_request.SetRequestPayer(request_payer_);
777775

tiledb/sm/filesystem/s3.h

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@
5656
#include <aws/core/utils/UUID.h>
5757
#include <aws/core/utils/memory/stl/AWSStringStream.h>
5858
#include <aws/core/utils/ratelimiter/DefaultRateLimiter.h>
59+
#include <aws/core/utils/stream/PreallocatedStreamBuf.h>
5960
#include <aws/core/utils/threading/Executor.h>
6061
#include <aws/identity-management/auth/STSAssumeRoleCredentialsProvider.h>
6162
#include <aws/s3/S3Client.h>
@@ -508,6 +509,32 @@ class S3 {
508509
mutable std::mutex mtx;
509510
};
510511

512+
/**
513+
* Used to stream results from the GetObject request into
514+
* a pre-allocated buffer.
515+
*/
516+
class PreallocatedIOStream : public Aws::IOStream {
517+
public:
518+
/**
519+
* Value constructor. Does not take ownership of the
520+
* input buffer.
521+
*
522+
* @param buffer The pre-allocated underlying buffer.
523+
* @param size The maximum size of the underlying buffer.
524+
*/
525+
PreallocatedIOStream(void* const buffer, const size_t size)
526+
: Aws::IOStream(new Aws::Utils::Stream::PreallocatedStreamBuf(
527+
reinterpret_cast<unsigned char*>(buffer), size)) {
528+
}
529+
530+
/** Destructor. */
531+
~PreallocatedIOStream() {
532+
// Delete the unmanaged `Aws::Utils::Stream::PreallocatedStreamBuf`
533+
// that was allocated in the constructor.
534+
delete rdbuf();
535+
}
536+
};
537+
511538
/* ********************************* */
512539
/* PRIVATE ATTRIBUTES */
513540
/* ********************************* */

0 commit comments

Comments
 (0)