Skip to content

Conversation

mahnerak
Copy link
Contributor

@mahnerak mahnerak commented Feb 21, 2022

In SageMaker Notebook several ML tools do not work, including aim, lit, mlflow, streamlit, etc.
The reason: jupyter-server-proxy fails on POST requests if the client sends the body using Transfer-Encoding: chunked.

TensorBoard does work only because of a workaround solution of replacing all the POST requests with GETs.

In more detail:
The Transfer-Encoding is used by http client to specify whether the POST method streams without need of knowing content length ahead of time.

Client request path is roughly:

[1] Client Browser  -> [2] AWS SageMaker Proxy  -> [3] Jupyter Server Proxy a.k.a. nbserverproxy
                                                           |
                                                           v
                                                   [4] Some Server (Aim / MLFlow / Streamlit / Lit / etc)

Now, forwarding some headers, including Transfer-Encoding is wrong because the transfer encoding between [2]->[3] can be different from [3]->[4]. In our case, SageMaker does always use Transfer-Encoding: chunked for POST requests which makes it to override Transfer-Encoding header value of request [3]->[4] while the http client used by [3] does not have to change the actual transfer encoding accordingly (it simply overrides headers).
When receiving POST request body by [4] it fails to decode the data because although the header specifies chunked , the body is encoded as a whole, not in chunks (by using Content-Length to detect end of message instead of \r\n sentinels) .
The HTTP standard defines https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding as Hop-By-Hop header (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers#hop-by-hop_headers) thus must not be retransmitted by proxies or cached. In other http proxy implementations such as squid, node-http-proxy, etc. such headers are removed prior to sending to endpoint.

This PR focuses on removing not only Transfer-Encoding but all hop-by-hop headers to avoid such conflicts.

Note: SageMaker does not come with the latest jupyter-server-proxy. But the issue persists even after upgrading.

References:

@welcome
Copy link

welcome bot commented Feb 21, 2022

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@mahnerak mahnerak marked this pull request as ready for review February 21, 2022 15:57
@mahnerak mahnerak changed the title Hop by hop header handling [bugfix] Hop by hop header handling Feb 21, 2022
@ryanlovett
Copy link
Collaborator

Thanks for this PR @mahnerak and the thorough explanation! It makes a lot of sense to me. I found RFC 2616, 13.5.1 which corroborates the hop-by-hop header list. That RFC was superseded in 2014 (more), but the ones that replaced it don't spell out the full list.

Could you please add a comment which links to the (apparently outdated) RFC anyways? Or maybe whatever else is more authoritative, though I don't know what that is.

Apparently Proxy-Connection is not formally a hop-by-hop header, but since it has the same functionality as Connection I think it makes sense to group it in the self-documenting hop-by-hop list. Perhaps we can link to something like this as a comment, in case anyone wonders why it is being included? This is just to be pedantic (even though I never originally documented why Proxy-Connection was being excluded).

@mahnerak
Copy link
Contributor Author

mahnerak commented Feb 25, 2022

Thanks @ryanlovett for the comment.

Could you please add a comment which links to the (apparently outdated) RFC anyways? Or maybe whatever else is more authoritative, though I don't know what that is.

Sure! Also, the current RFC7230 standard requires client to explicitly state the hop-by-hop header names in the Connection header and does not exempt even if they were known as hop-by-hop header names.
(see: https://datatracker.ietf.org/doc/html/rfc7230#section-6.1)
So there is no predefined list of hop-by-hop headers. But as many clients do not respect RFC7230 and omit Connection header (including SageMaker proxy) I believe we better have a predefined list (say, the list from RFC2616).

Let's add links to:

Apparently Proxy-Connection is not formally a hop-by-hop header, but since it has the same functionality as Connection I think it makes sense to group it in the self-documenting hop-by-hop list. Perhaps we can link to something like this as a comment, in case anyone wonders why it is being included? This is just to be pedantic (even though I never originally documented why Proxy-Connection was being excluded).

I'll make sure to include a comment for Proxy-Connection as well.

@mahnerak
Copy link
Contributor Author

One more thing on:
We need to implement handling of hop-by-hop headers according RFC7230 by looking at Connection header value.
I'll prepare a follow-up PR for that after testing it.

@ecrows
Copy link

ecrows commented May 2, 2022

Just as a note, I found that during writing my own workaround for this bug that in addition to stripping the Transfer-Encoding header I had to provide a correct Content-Length header as well.

i.e., I added something like this:

if 'Transfer-Encoding' in headers:
    del headers['Transfer-Encoding']
    if body is not None:
        headers['Content-Length'] = len(body)

Just wanted to relay this information in case it's useful.

@austinmw
Copy link

Any progress on this? Also the notebook app FiftyOne seems to be broken by this as well.

@yuvipanda
Copy link
Contributor

Thanks a lot for this, @mahnerak! I'm happy to merge this once #328 (comment) is accepted. I'm not able to accept it myself for some reason - perhaps the 'allow maintaieners to push to this PR' option is not enabled for this PR?

@mahnerak
Copy link
Contributor Author

Thanks a lot for this, @mahnerak! I'm happy to merge this once #328 (comment) is accepted. I'm not able to accept it myself for some reason - perhaps the 'allow maintaieners to push to this PR' option is not enabled for this PR?

@yuvipanda Let me check it for you.

@mahnerak
Copy link
Contributor Author

@yuvipanda I couldn't find allow maintaieners to push to this PR checkbox in this PR.
I've just commited your change suggestions.

@yuvipanda yuvipanda merged commit 969850e into jupyterhub:main Jun 24, 2022
@welcome
Copy link

welcome bot commented Jun 24, 2022

Congrats on your first merged pull request in this project! 🎉
congrats
Thank you for contributing, we are very proud of you! ❤️

@yuvipanda
Copy link
Contributor

Thanks a lot, @mahnerak!

@austinmw
Copy link

austinmw commented Aug 26, 2022

Anyone know if this update alone enables things like Tensorboard cell magic or Flask apps in SageMaker?

@dwffls
Copy link

dwffls commented Sep 29, 2023

@austinmw

Any progress on this? Also the notebook app FiftyOne seems to be broken by this as well.

Did you get fiftyone working with the proxy. I'm still not able to get it working unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants