-
Notifications
You must be signed in to change notification settings - Fork 1.1k
In-memory zero-copy S3 PutObject / GetObject with custom buffers in 2020 #1430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This does go in the right direction. Even though I don't think I want to use the I am using the following for
And
Now that's still more boilerplate than I'm comfortable with. That seems fixable with another custom class:
Now we can just:
For I'm not entirely happy with Now coming back to the example I do not get the point of At the same time, it seems to me that the allocated Would it not make sense to provide a |
Hi @tilsche ,
The sample is meant not to be "the way" but to show case on how to approach that,
Maybe, but honestly if we take it as a feature request it would be very low on priority. If you have a specific idea on how to implement it please make a PR so we may review it. |
This fixes a rare read corruption that may manifest with multiple different error messages. One example: `[TileDB::Buffer] Error: Read failed; Trying to read beyond buffer size` The issue is that the `Aws::IOStream` constructor that we use within the callback set in `GetObjectRequest::SetResponseStreamFactory()` accepts a pointer to a `boost::interprocess::bufferbuf`. We currently allocate one of these `bufferbuf` instances on the heap and free it when we return from `S3::read`. Experimentally, I have determined that this can cause a corruption to the buffer we stream into. We are not responsible for what the AWS SDK does with objects created from the callback we assigned in `GetObjectRequest::SetResponseStreamFactory()`. This patch introduces a small wrapper so that the `Aws::IOStream` manages the lifetime of the `boost::interprocess::bufferbuf`. I also noticed that we use an SDK version with a `Aws::Utils::Stream::PreallocatedStreamBuf` that we can use instead of the boost `bufferbuf`. Relevant discussion on stack overflow: aws/aws-sdk-cpp#1430 --- TYPE: BUG DESC: Fix rare read corruption in S3
This fixes a rare read corruption that may manifest with multiple different error messages. One example: `[TileDB::Buffer] Error: Read failed; Trying to read beyond buffer size` The issue is that the `Aws::IOStream` constructor that we use within the callback set in `GetObjectRequest::SetResponseStreamFactory()` accepts a pointer to a `boost::interprocess::bufferbuf`. We currently allocate one of these `bufferbuf` instances on the heap and free it when we return from `S3::read`. Experimentally, I have determined that this can cause a corruption to the buffer we stream into. We are not responsible for what the AWS SDK does with objects created from the callback we assigned in `GetObjectRequest::SetResponseStreamFactory()`. This patch introduces a small wrapper so that the `Aws::IOStream` manages the lifetime of the `boost::interprocess::bufferbuf`. I also noticed that we use an SDK version with a `Aws::Utils::Stream::PreallocatedStreamBuf` that we can use instead of the boost `bufferbuf`. Relevant discussion on stack overflow: aws/aws-sdk-cpp#1430 --- TYPE: BUG DESC: Fix rare read corruption in S3 Co-authored-by: Joe Maley <[email protected]>
This fixes a rare read corruption that may manifest with multiple different error messages. One example: `[TileDB::Buffer] Error: Read failed; Trying to read beyond buffer size` The issue is that the `Aws::IOStream` constructor that we use within the callback set in `GetObjectRequest::SetResponseStreamFactory()` accepts a pointer to a `boost::interprocess::bufferbuf`. We currently allocate one of these `bufferbuf` instances on the heap and free it when we return from `S3::read`. Experimentally, I have determined that this can cause a corruption to the buffer we stream into. We are not responsible for what the AWS SDK does with objects created from the callback we assigned in `GetObjectRequest::SetResponseStreamFactory()`. This patch introduces a small wrapper so that the `Aws::IOStream` manages the lifetime of the `boost::interprocess::bufferbuf`. I also noticed that we use an SDK version with a `Aws::Utils::Stream::PreallocatedStreamBuf` that we can use instead of the boost `bufferbuf`. Relevant discussion on stack overflow: aws/aws-sdk-cpp#1430 --- TYPE: BUG DESC: Fix rare read corruption in S3
This fixes a rare read corruption that may manifest with multiple different error messages. One example: `[TileDB::Buffer] Error: Read failed; Trying to read beyond buffer size` The issue is that the `Aws::IOStream` constructor that we use within the callback set in `GetObjectRequest::SetResponseStreamFactory()` accepts a pointer to a `boost::interprocess::bufferbuf`. We currently allocate one of these `bufferbuf` instances on the heap and free it when we return from `S3::read`. Experimentally, I have determined that this can cause a corruption to the buffer we stream into. We are not responsible for what the AWS SDK does with objects created from the callback we assigned in `GetObjectRequest::SetResponseStreamFactory()`. This patch introduces a small wrapper so that the `Aws::IOStream` manages the lifetime of the `boost::interprocess::bufferbuf`. I also noticed that we use an SDK version with a `Aws::Utils::Stream::PreallocatedStreamBuf` that we can use instead of the boost `bufferbuf`. Relevant discussion on stack overflow: aws/aws-sdk-cpp#1430 --- TYPE: BUG DESC: Fix rare read corruption in S3 Co-authored-by: Joe Maley <[email protected]>
Confirm by changing [ ] to [x] below:
Platform/OS/Hardware/Device
Linux x86-64
Describe the question
I have large memory buffers that I need to transfer to S3 and vice versa at optimal performance. I need to avoid redundant allocations and copies. How is this possible in a clean idiomatic way?
This question has been asked in various issues, e.g., #64, #533, #785. However, they are all closed with no clear documented solution and it has been suggested to open a new issue if needed. In general, the suggested approaches revolve around a custom stream implementation, which is unfortunately not trivial to get right and would preferably not done in user code again and again.
I am particularly referencing a comment from @JonathanHenson from 2018:
I would hope that the very near future is now but unfortunately I was unable to find such an implementation - but I consider it entirely possible that it exists in the vastness of the Aws SDK.
Logs/output
N/A
The text was updated successfully, but these errors were encountered: