Skip to content

Conversation

@johnhurt
Copy link

Extracting the hyper request in the request context from its arc/mutex wrappers.

The effect here is that the Extractor now has complete mutable access to the request context during extraction. With the drawback that all extraction steps are done in serial. The loss of parallelism is not as bad as it sounds considering only the reading of the body from the request was an actual asynchronous operation.

@davepacheco
Copy link
Collaborator

davepacheco commented May 3, 2021

Thanks for doing this. I think the &mut RequestContext is a better representation for this than Arc<RequestContext> was because it also prevents the Extractor from hanging onto the RequestContext beyond the life of the call.

Edit: I agree that the loss of apparent parallelism here isn't a problem. It doesn't seem likely that we'd have extractors other than the body extractor that would be async, and the others ought to be very quick, so there's not much parallelism to be had.

@smklein can you take a look at this change too?

@johnhurt I'm not sure the clippy warning is a false positive. Is there a way to rewrite the code to actually avoid the warning?


// This `allow` prevents a false positive from clippy. See -
// https://github.com/rust-lang/rust-clippy/issues/4637
#[allow(clippy::eval_order_dependence)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust's language support for operating on tuples, which are inherently heterogeneous anyway, is limited. I'm not sure of a better way to perform this macro-esque operation without something that might cause more pain (e.g., creating a new type for the tuple and forcing serialization, then converting back to a tuple object).

TL;DR: I'm totally fine leaving in the lint escape hatch here, as tuple initialization is well-defined in Rust (unlike arbitrary subexpressions).

Copy link
Contributor

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change - I definitely think this makes the ownership semantics more clear (in that the RequestContext exclusively owns the request, which makes a lot of sense).

@johnhurt
Copy link
Author

johnhurt commented May 3, 2021

Thanks, yeah. I tried for a while to get around the clippy lint there. Here's a rust playground that I slapped together to show what's going on without the macro stuff getting in the way.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=a33a95bb5446cb982f67667369fdb0f2

If you look at that example, you'll see that the fix is pretty simple, but it requires creating local variables which we can't do in this macro (without importing paste or something to synthesize variable names from T's).

If you remove the async's then clippy is happy either way

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d08aefd271c7eda0776947973c61d045

Comment on lines 199 to 208
// This `allow` prevents a false positive from clippy. See -
// https://github.com/rust-lang/rust-clippy/issues/4637
#[allow(clippy::eval_order_dependence)]
async fn from_request<Context: ServerContext>(_rqctx: &mut RequestContext<Context>)
-> Result<( $($T,)* ), HttpError>
{
futures::try_join!($($T::from_request(Arc::clone(&_rqctx)),)*)
Ok(($(
$T::from_request(_rqctx).await?
,)*))
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw that paste is already a dependency, so we could use it to do something like this.

Suggested change
// This `allow` prevents a false positive from clippy. See -
// https://github.com/rust-lang/rust-clippy/issues/4637
#[allow(clippy::eval_order_dependence)]
async fn from_request<Context: ServerContext>(_rqctx: &mut RequestContext<Context>)
-> Result<( $($T,)* ), HttpError>
{
futures::try_join!($($T::from_request(Arc::clone(&_rqctx)),)*)
Ok(($(
$T::from_request(_rqctx).await?
,)*))
}
#[allow(non_snake_case)]
async fn from_request<Context: ServerContext>(_rqctx: &mut RequestContext<Context>)
-> Result<( $($T,)* ), HttpError>
{
paste! {
$(
let [<v_ $T>] = $T::from_request(_rqctx).await?;
)*
Ok(($(
[<v_ $T>]
,)*))
}
}

It still requires a lint ... plus it's kind of messy :p

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's better; tbh I think I prefer the original. A comment saying "tuple initialization order is well-defined, so the lint is a false positive" would IMO be the right direction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agreed. I just updated the comment. Thanks, @smklein

@davepacheco
Copy link
Collaborator

Great!

This is a breaking change for consumers, isn't it? Because they have access to this field and its type changed. That's okay but we should bump the minor version number in Cargo.toml (for both dropshot and dropshot_endpoint, since we keept hose in sync; and minor rather than major because we're still on major version 0). And we should update the changelog with a sentence or two explaining the impact.

@johnhurt
Copy link
Author

johnhurt commented May 7, 2021

Got it. Thanks, @davepacheco. I can add those changes to this pr early tomorrow morning.

Next time we're in the mood for some breaking changes, we could extract the RequestContext from the Arc it's being wrapped in. The request isn't being shared, so the Arc is superfluous. It would be a considerably larger breaking change, but it would giver the handlers full ownership of the context (For better or worse).

@davepacheco
Copy link
Collaborator

As long as we've got clear instructions for users to know if they're affected and how to update, I don't mind breaking changes too much. We may still want to provide it as &RequestContext to avoid people hanging onto state that they shouldn't. (I think this is a problem with the current Arc approach too.)

@johnhurt
Copy link
Author

johnhurt commented May 7, 2021

Hmm, yeah. If there is a concern about users hanging onto the context too long, then I think switching from Arc to &RequestContext would be the way to address that. A mutable reference would be trivial too we want to allow write access to the context.

@davepacheco
Copy link
Collaborator

davepacheco commented May 10, 2021

Thanks for that. As I read it: I don't think I appreciated the new constraint on consumers. Does this mean that if a consumer was reading the body of the request on their own, this would break them? How would they work around it?

@johnhurt
Copy link
Author

johnhurt commented May 10, 2021

You're right. As it is now, that is a harsh change if someone was reading from the body. There's no way for them to get the body out without unsafe code. Here are the two options to enable that user (if we forward with this change)

  1. They use the facilities in Dropshot that exist already to get the body as bytes or deserialized from json
  2. Instead of giving them an Arc<RequestContext>, we give them a &mut RequestContext

I'm in the process of making an example pr for #2 now just to see how big a change it would be.

@davepacheco
Copy link
Collaborator

Got it. Sorry for not realizing this earlier!

I think the escape hatch of being able to read the body without Dropshot's support is useful, though I could be convinced otherwise. Besides allowing us to punt on stuff in Dropshot, it allows prototyping stuff outside Dropshot that we might eventually put into Dropshot. Websockets is an example that we're probably going to look at down the road, though there may be breaking API changes required to make that even prototype-able outside Dropshot.

I also think there are consumers that stream the body in -- maybe @jclulow has one?

@johnhurt
Copy link
Author

Oh, I didn't think about streaming. Yeah, if that's a possible future requirement, then #1 from my list above won't cut it.

@johnhurt
Copy link
Author

Alright, so modifying the request handlers to accept a reference to the RequestContext instead of an Arc was not trivial, but I managed to bend my brain in the right way to understand it. I have a working prototype in my fork, but my question to you is:

Should I merge those changes into this PR or create a new pr on top of these changes?

I think you're right that the changes in this PR are possibly too breaking to hit a release, but coupling these changes with the change to provide the request context as a mutable reference:

  1. Prevents users from hanging on to context outside function scope
  2. Eliminates interior mutability in request context but still allows mutable access to request body
  3. Only minor syntax changes are required: Arc<RequestContext> -> &mut RequestContext (Surprisingly lifetime parameters!)

The only drawback I see is that it exposes all the other fields in the request context for mutable access.

@johnhurt
Copy link
Author

Here is the diff that shows the changes to go from this pr to providing the request context as a mutable pointer.

johnhurt/dropshot@reqctx-remove-mutex...reqctx-arc-to-ref

It is a really noisy change :( but as you can probably guess the only meaningful parts are in handler.rs

@davepacheco
Copy link
Collaborator

@johnhurt Sorry I haven't had a chance to look at this more closely this week. I'm a little wary of handing folks a &mut RequestContext. What if we instead put a function on the RequestContext that returned a &mut Request? Then they'd only be able to mutate the request, at least. What do you think?

@johnhurt
Copy link
Author

johnhurt commented May 15, 2021

It might make more sense to hide the other fields of the request behind functions and have them returned as immutable references. In order to return a mutable reference to the hyper request, the type we provide to the request handlers needs to either be a mutable ref or owned (or have interior mutability 😅 ). It might be possible to have the request context struct contain a mutable borrow of the hyper request ala

struct RequestContext<'a> {
    request: &'a mut Request<Body>
    ...
}

Then even an Arc would allow you to get a mutable hyper request. The cost you pay there is passing around a lifetime everywhere the context gets used. That seems tricky but doable; if it's possible, it would really be a hammer you can only hold by the handle.

Edit - I realized that part about an Arc being able to contain a struct with reference can't be true. I'll keep thinking.

@johnhurt
Copy link
Author

A thought occurred to me while I was sitting in traffic on the way back from the beach today. What if we provide the hyper request via an extractor and remove it from the request context? Then the request context can be passed to handlers as an immutable reference, and handlers that need the raw request can ask for it as a separate mutable reference.

@davepacheco
Copy link
Collaborator

I think that's a pretty cool idea! I don't think I grok all the implications, but it seems to neatly solve some of the issues we've been dealing with.

I think we'd want to produce a compile error if your endpoint handler function accepts more than one of these.

Also: what happens if you accept one of these and a body extractor?

@johnhurt
Copy link
Author

johnhurt commented Jun 5, 2021

Yeah, I think it will be cool if I can get it working. The biggest implication (in my opinion) is that it will make the endpoint methods a little less clean. I think it will look like this

#[endpoint { method = PUT, path = "/counter"}]
pub async fn example_api_put_counter(
    ctx: & RequestContext<ExampleContext>,
    request: &mut Request<Body>,
    update: TypedBody<CounterValue>,
) -> Result<HttpResponseUpdatedNoContent, HttpError> {
    ...
}

or like this in the worst case where we need to wrap the request in something but still need a lifetime

#[endpoint { method = PUT, path = "/counter"}]
pub async fn example_api_put_counter(
    ctx: &'_ RequestContext<ExampleContext>,
    mut request: RequestWrapper<'_>,
    update: TypedBody<CounterValue>,
) -> Result<HttpResponseUpdatedNoContent, HttpError> {
    ...
}

The procedural macro should make detecting conflicts and raising compiler errors easy enough. It looks like it already errors if you try to use more than one body extractor, so that should be extendable to handle more than one request context (rust will prevent you from using 2 requests because they will both me mutable refs. The only thing we have to worry about is making the error message clear).

The case where a function requires a Request<Body> and one of the body deserializer, the results should match existing behaviors. In current handler functions that accept a body, the user can try to read from the request's body again, but since the body has been read and purged, they should get nothing more from it.

… a ref instead of a mut ref, and starting work on making the raw hyper context extractable as a mut ref via code generation instead of type jiggling
johnhurt added 4 commits June 18, 2021 20:29
# Conflicts:
#	dropshot/tests/fail/bad_endpoint6.rs
#	dropshot/tests/fail/bad_endpoint6.stderr
#	dropshot/tests/test_pagination.rs
# Conflicts:
#	dropshot/tests/fail/bad_endpoint6.rs
#	dropshot/tests/fail/bad_endpoint6.stderr
#	dropshot/tests/test_pagination.rs
#	dropshot_endpoint/src/lib.rs
… references to the raw hyper context. This is done by adding a synthetic handler function that accepts owned instances of all the extracted types and calls in the original handler function with the appropriate level of borrowing. We maintain fine grained control over what reference and mutability level an extracted type can be used by introducing the `Extractable` marker trait that can be implemented on reference and/or mutable references and plain types.
@johnhurt johnhurt changed the title To-Done: Extracting the hyper request in the request context Work in Progress - Allow Raw Hyper Context to be Extracted as &mut Jun 20, 2021
@johnhurt
Copy link
Author

Oof. I finally got a working proof of concept for my idea of making the raw Request<Body> extractable as a mutable reference. I went down 20 different dead ends trying to coherently modify the Extractor system to correctly produce mutable references with compatible lifetimes. It should totally be possible, but I couldn't get there.

Instead, I cheated, and used the existing procedural macro to emit a wrapping function for the user's request handler that accepts all the extracted types as owned objects, and then passes them on with the correct ownership level. It's not foolproof, but I think it covers all but the most obtuse and deliberately incorrect usages.

I apologize for how noisy this pr is now. There are lots of places where the use of Arc<RequestContext<_>> was switched to &RequestContext<_>. There is still a little more left to do to make this release-worthy. As David mentioned above, there should be a compiler error for multiple extractions of the request. I just wanted to get this in front of some people for feedback on the idea and implementation.

Also this PR started with a much simpler premise and obviously morphed into something new, so I updated the title to reflect that better.

Comment on lines +15 to +24
#[endpoint {
method = GET,
path = "/raw_req_1",
}]
async fn demo_handler_path_param_impossible(
_rqctx: &RequestContext<usize>,
_raw_request: &mut Request<Body>,
) -> Result<Response<Body>, HttpError> {
todo!()
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the good stuff. It's buried in all the noise, but it totally works.

johnhurt added 4 commits June 20, 2021 11:17
… the change log to reflect what is actually done, and "sealing" the `Extractable` trait so that it cannot be implemented outside of the crate.
…he same type cannot appear in the same handler function, and added some tests to verify the behavior. I also updated some more of the documentation to mention the ability to extract the raw request into the handler functions as a mutable reference.
@johnhurt johnhurt changed the title Work in Progress - Allow Raw Hyper Context to be Extracted as &mut Allow Raw Hyper Context to be Extracted as &mut Request<Body> Jun 25, 2021
@johnhurt
Copy link
Author

I am marking this PR as ready to go. I appreciate all the feedback I have received already. Please let me know if I overlooked anything.

Comment on lines +1 to +7
error[E0119]: conflicting implementations of trait `handler_argument_for_endpoint_with_duplicate_raw_requests` for type `hyper::Request<hyper::Body>`:
--> $DIR/duplicate_argument_type.rs:29:5
|
28 | _: &mut Request<Body>,
| - first implementation here
29 | _: &mut SneakyBody
| ^ conflicting implementation for `hyper::Request<hyper::Body>`
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally happy with this error message. The trick I got from the static-assert crate for making sure no types are the same is to implement a trait for all of them, and then redundant traits will cause a compile time error. The best I could to make the compile-time error mean something was to give the trait the name above

handler_argument_for_${function_name}

It gets the point across, but bleh

johnhurt added 3 commits June 24, 2021 22:21
# Conflicts:
#	dropshot/Cargo.toml
#	dropshot/src/router.rs
#	dropshot/tests/test_openapi.rs
#	dropshot_endpoint/src/lib.rs
@davepacheco
Copy link
Collaborator

@johnhurt I'm sorry we haven't done a better job staying on top of this one. I just went through the history again to page this back in and try to remember all the considerations that we talked through. I think there's good stuff here. At the same time, it feels hard (for me anyway) to grasp all the implications without having plumbed it through several different real consumers that are doing different things (e.g., just grabbing headers out, reading the body, etc.). My fear is that we land this, then realize we have to burn a bunch of time updating several consumers -- and worse, find out partway through that we need to keep iterating. That's all fine, really, but it's a lot more work than I think we expected when we looked at the original problem. (The original problem was just that you have to grab an (uncontended) mutex in order to read the request body, right?) And empirically we've had a hard time making time to do it.

I'm interested in what other reviewers think and I definitely don't want to discourage any of that validation work! I'm inclined to leave this open until someone does, since there's definitely room for improvement and this might well be the right approach. I'm just not sure who's going to be able to look at this any time soon.

@johnhurt
Copy link
Author

johnhurt commented Nov 14, 2021

Thanks, @davepacheco. I think those are valid concerns. I think I can give you all a rundown of the implications and some recommendations based on revisiting this pr with fresh eyes.

I was hoping I could give 2 positive implications for this pr

  1. Performance boost because we aren't creating mutexes or manipulating arcs on each request.
  2. Improved scope control for requests and content

I did some performance testing on the main vs pr branch with the hopes of seeing something, but there was no difference in the results. I checked the flamegraphs for each branch to see if I could see any differences, and while there is maybe a slight reduction in the time taken to handle the request, that time is dwarfed (100x) by the time taken to deserialize request objects. So if the goal is to improve performance, this is really not the way :P

Point 2 above is still valid, but your right, the implications are not super clear (as we have seen from the number of revisions this pr has taken), so let me list them out. I am going to note on each one which are real constraints and which are artificial. An artificial constraint in this case is something we are enforcing with code generation or marker traits.

  • Handler functions take &RequestContext instead of Arc<RequestContext> (artificial)
    • No change in read/write-ability since arcs and refs are immutable
    • Users cannot hold onto requests after the handler function exits
  • Request Contexts no longer expose Request<Body>
    • The user has to include the RequestBody as a parameter in their handler function to access it
  • Request<Body> can no longer be retrieved as Arc<Mutex<Request<Body>>> (artificial)
    • We limit the forms that Request<Body> can be requested as a handler function argument to & and &mut
    • Users can still get mutable access to the request object, but cannot keep the request object beyond the scope of the handler function

Those are pretty consequential implications especially because it means all users modifying all their endpoints to make sure they don't do something they probably weren't going to do anyway. Luckily, most of the big restrictions are artificial. That means we/I can modify those restrictions to allow (almost) all the current endpoints to continue to work unchanged. The benefit being it allows for a gradual, phased introduction of the new restrictions into the customer code.

Ex. using an Arc could transition from Fine -> Warning -> Error

I don't think making those artificial restrictions less restrictive would take much work, so my question to you (plural) is whether or not we care about clients keeping request objects beyond scope because (other than removing the lock() clunkiness from the code) that seems to be the only benefit.

@johnhurt
Copy link
Author

Here is the code for the performance comparison project I slapped together. It's maybe interesting because it has the flamegraph svgs for the main and pr branches.

https://github.com/johnhurt/dropshot-comparison

@davepacheco
Copy link
Collaborator

I think it's fine for Request to no longer be available after the handler function completes. The caller can copy whatever key fields they want. I remain worried about unforeseen ergonomic issues in the API change.

@johnhurt
Copy link
Author

Yeah, based on the discussion above. I don't think this change is adding enough to make it worth the potential change-induced headaches. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants