Skip to content

Conversation

@jaschrep-msft
Copy link
Member

Implements two foundation components to implement partitioned upload and copy.

  • PartitionedStream: Consumes a Box<dyn SeekableStream> and converts it to a Stream<Item = Result<Bytes, Error>> where each Ok(Bytes) returned is a contiguously buffered partition to be used for a put block or equivalent request.
  • run_all_with_concurrency_limit(): Takes a sequence of async jobs (impl FnOnce() -> impl Future<Output = Result<(), Error>>). These will be sequences of put block operations or equivalent requests.

@github-actions github-actions bot added the Storage Storage Service (Queues, Blobs, Files) label Nov 12, 2025
@jaschrep-msft jaschrep-msft marked this pull request as ready for review November 19, 2025 22:01
Copilot AI review requested due to automatic review settings November 19, 2025 22:01
Copilot finished reviewing on behalf of jaschrep-msft November 19, 2025 22:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements foundational components for partitioned upload and copy operations in Azure Storage Blob SDK, introducing stream partitioning and concurrent operation execution capabilities.

  • Adds PartitionedStream that converts a SeekableStream into partitioned Bytes chunks for block operations
  • Implements run_all_with_concurrency_limit() for executing async operations with configurable concurrency
  • Includes comprehensive test coverage for both components

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
sdk/storage/azure_storage_blob/src/streams/partitioned_stream.rs New implementation of PartitionedStream with Stream and FusedStream traits, plus test suite
sdk/storage/azure_storage_blob/src/streams/mod.rs Module declaration for streams
sdk/storage/azure_storage_blob/src/partitioned_transfer/mod.rs New implementation of run_all_with_concurrency_limit() with concurrency control logic and tests
sdk/storage/azure_storage_blob/src/lib.rs Module declarations for new partitioned_transfer and streams modules
sdk/storage/azure_storage_blob/Cargo.toml Added bytes and futures dependencies (and rand for tests)
Cargo.lock Lock file updates reflecting new dependencies

jaschrep-msft and others added 5 commits November 19, 2025 22:21
Accept useful generated comments.

Co-authored-by: Copilot <[email protected]>
generated docs tried to write a doctest for a non-public function.
Comment on lines +36 to +44
fn take(&mut self) -> Vec<u8> {
let mut ret = mem::replace(
&mut self.buf,
vec![0u8; std::cmp::min(self.partition_len, self.inner.len() - self.total_read)],
);
ret.truncate(self.buf_offset);
self.buf_offset = 0;
ret
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're already taking a dependency on Bytes. Use it. It already has all the functionality for this. At the very least, don't repeat the calculation of buf.len() that you'd already have computed during construction. Bytes has already been well-tested. I see potential faults here.

Poll::Ready(Some(Ok(Bytes::from(ret))))
};
} else {
match ready!(pin!(&mut this.inner).poll_read(cx, &mut this.buf[this.buf_offset..]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More consistent to use #[pin_project] like we do in our pager and pollers. Also, why are you matching on ready!(...)? I don't understand how that's useful. That's meant to produce output when implementing futures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heaths why is implementing a poll method on a stream so different from "implementing futures"? I'm definitely new to writing poll methods, but my understanding is that ready!(exp) expands to the following:

match exp {
    Poll::Ready(ret) = ret,
    Poll::Pending => return Poll::Pending,
}

That seems like what I want to do here, right? If the read of the inner stream is pending, then I want to propagate that state. Is it some sort of style thing to write out the full match instead of the macro in some scenarios? I don't understand what I'd do instead in this scenario except not implement poll_next() in the first place.

I'll look into pin_project. New concept to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poll_read would return a Poll::Ready(_) or Poll::Pending, yes. But by wrapping it in a ready!(_) you're making it Poll::Ready. ready!(_) is to return a value as an already-ready state. Pager is probably a good one to look at, and after changes I'm making to manually do it for various reasons: #3372

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the parts of your linked PR where you are matching the result of the poll and returning. Not to dig in, but I am looking at the docs and (apart from a variable name) the documented expansion of the macro looks character-for-character like what you're writing in your PR. I'd like to understand either how I am wrong in my comparison or why I am meant to manually write out the full expansion even if they are identical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Storage Storage Service (Queues, Blobs, Files)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants