-
Notifications
You must be signed in to change notification settings - Fork 320
Storage Partitioned Transfer Base #3340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements foundational components for partitioned upload and copy operations in Azure Storage Blob SDK, introducing stream partitioning and concurrent operation execution capabilities.
- Adds
PartitionedStreamthat converts aSeekableStreaminto partitionedByteschunks for block operations - Implements
run_all_with_concurrency_limit()for executing async operations with configurable concurrency - Includes comprehensive test coverage for both components
Reviewed Changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/storage/azure_storage_blob/src/streams/partitioned_stream.rs | New implementation of PartitionedStream with Stream and FusedStream traits, plus test suite |
| sdk/storage/azure_storage_blob/src/streams/mod.rs | Module declaration for streams |
| sdk/storage/azure_storage_blob/src/partitioned_transfer/mod.rs | New implementation of run_all_with_concurrency_limit() with concurrency control logic and tests |
| sdk/storage/azure_storage_blob/src/lib.rs | Module declarations for new partitioned_transfer and streams modules |
| sdk/storage/azure_storage_blob/Cargo.toml | Added bytes and futures dependencies (and rand for tests) |
| Cargo.lock | Lock file updates reflecting new dependencies |
sdk/storage/azure_storage_blob/src/streams/partitioned_stream.rs
Outdated
Show resolved
Hide resolved
sdk/storage/azure_storage_blob/src/streams/partitioned_stream.rs
Outdated
Show resolved
Hide resolved
sdk/storage/azure_storage_blob/src/streams/partitioned_stream.rs
Outdated
Show resolved
Hide resolved
Accept useful generated comments. Co-authored-by: Copilot <[email protected]>
generated docs tried to write a doctest for a non-public function.
| fn take(&mut self) -> Vec<u8> { | ||
| let mut ret = mem::replace( | ||
| &mut self.buf, | ||
| vec![0u8; std::cmp::min(self.partition_len, self.inner.len() - self.total_read)], | ||
| ); | ||
| ret.truncate(self.buf_offset); | ||
| self.buf_offset = 0; | ||
| ret | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're already taking a dependency on Bytes. Use it. It already has all the functionality for this. At the very least, don't repeat the calculation of buf.len() that you'd already have computed during construction. Bytes has already been well-tested. I see potential faults here.
| Poll::Ready(Some(Ok(Bytes::from(ret)))) | ||
| }; | ||
| } else { | ||
| match ready!(pin!(&mut this.inner).poll_read(cx, &mut this.buf[this.buf_offset..])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More consistent to use #[pin_project] like we do in our pager and pollers. Also, why are you matching on ready!(...)? I don't understand how that's useful. That's meant to produce output when implementing futures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heaths why is implementing a poll method on a stream so different from "implementing futures"? I'm definitely new to writing poll methods, but my understanding is that ready!(exp) expands to the following:
match exp {
Poll::Ready(ret) = ret,
Poll::Pending => return Poll::Pending,
}That seems like what I want to do here, right? If the read of the inner stream is pending, then I want to propagate that state. Is it some sort of style thing to write out the full match instead of the macro in some scenarios? I don't understand what I'd do instead in this scenario except not implement poll_next() in the first place.
I'll look into pin_project. New concept to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
poll_read would return a Poll::Ready(_) or Poll::Pending, yes. But by wrapping it in a ready!(_) you're making it Poll::Ready. ready!(_) is to return a value as an already-ready state. Pager is probably a good one to look at, and after changes I'm making to manually do it for various reasons: #3372
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the parts of your linked PR where you are matching the result of the poll and returning. Not to dig in, but I am looking at the docs and (apart from a variable name) the documented expansion of the macro looks character-for-character like what you're writing in your PR. I'd like to understand either how I am wrong in my comparison or why I am meant to manually write out the full expansion even if they are identical.
Implements two foundation components to implement partitioned upload and copy.
PartitionedStream: Consumes aBox<dyn SeekableStream>and converts it to aStream<Item = Result<Bytes, Error>>where eachOk(Bytes)returned is a contiguously buffered partition to be used for a put block or equivalent request.run_all_with_concurrency_limit(): Takes a sequence of async jobs (impl FnOnce() -> impl Future<Output = Result<(), Error>>). These will be sequences of put block operations or equivalent requests.