Skip to content

Conversation

@sunshowers
Copy link
Contributor

(This is a version of #541 that breaks BC, per Adam's suggestion. I
do agree that this is overall a better approach.)

The current UntypedBody extractor writes data into a single Vec<u8>.
Consider what happens if the body is large (e.g. 100MB, which can happen
if uploading an artifact over HTTP). As each chunk (typically 10-100KB)
comes in, we'll have to both copy data from the incoming Bytes, and
reallocate the Vec over and over.

To avoid this issue, Eliza Weisman and I wrote buf-list, which
represents a list (really a queue) of chunks that can be operated on
using standard Tokio and other abstractions:
https://crates.io/crates/buf-list.

Make the UntypedBody extractor represent a bytestream that hasn't been
read yet. This allows us to extract the body as a Bytes, a BufList,
or any other stream one chooses.

One other change I did is to remove the nonexistent type parameter J
from UntypedBody<J> suggestions -- that didn't look right.

One consideration here is that BufList needs to be exposed as a type.
It's currently at 0.1 -- I could release a 1.0 if that would be helpful
as far as exposing in the API goes. What do you think?

Created using spr 1.3.4
@sunshowers
Copy link
Contributor Author

sunshowers commented Jan 6, 2023

It occurred to me that we could make into_stream return a CappedBytesStream adapter, which implements Stream and caps the size of it at the max request bytes. To ignore the limit you could just into_inner() the adapter. I think that might be the best way to do this.

In a followup we could then switch the current into_buf_list and into_bytes impls over to using that.

Created using spr 1.3.4
@sunshowers
Copy link
Contributor Author

I switched over to CappedBodyStream/UncappedBodyStream, which I think expresses the intent of the API better and in a more misuse-resistant fashion. I also removed the read_http_body methods in favor of just using the body streams as a building block.

Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
@sunshowers sunshowers changed the base branch from sunshowers/spr/main.make-untypedbody-be-able-to-extract-to-a-buflist to main January 6, 2023 18:03
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry -- I'd written most of this up before your latest change.

#[derive(Debug)]
pub struct UntypedBody {
content: Bytes,
request: Arc<Mutex<Request<Body>>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to take this lock at the point where we take it? Previously, we read this whole thing earlier. I get why we don't do this now, but that means we're taking a lock later in the request processing. I thought we'd be holding the lock for longer, too, but I don't think that's true.

/// # Errors
///
/// Errors if the body length exceeds the given cap.
pub async fn http_read_body_bytes<T>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just use BufList internally and only turn that into a String (or Bytes I guess) if we really need it? I think the main use case where we want a String is because we're going to parse it with serde. Is it a lot more efficient to read it to a Bytes first than a BufList?

Created using spr 1.3.4
let mut request = self.request.lock().await;
let body = request.body_mut();

'outer: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I said I'd reserve my feedback until you go through @davepacheco's comments, but free advice on this: it might be less work to defer the use of relatively-new structures rather than potentially impacting some dropshot consumer at Oxide that's on an older rust version. My personal threshold is about 4-6 months in terms of my willingness to just hope that folks are up-to-date.

@sunshowers sunshowers marked this pull request as draft January 6, 2023 20:36
@sunshowers
Copy link
Contributor Author

Discussing this with @davepacheco and @ahl, will mark this as draft until we come to a consensus.

@ahl
Copy link
Collaborator

ahl commented Mar 13, 2023

should we close this PR?

@sunshowers
Copy link
Contributor Author

Yes, will re-do this per RFD 353 later.

@sunshowers sunshowers closed this Mar 13, 2023
@sunshowers
Copy link
Contributor Author

(re-did in #617)

@sunshowers sunshowers deleted the sunshowers/spr/make-untypedbody-be-able-to-extract-to-a-buflist branch March 22, 2023 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants