Description
This is a complex issue and I may fork it out into multiple places once we've identified work items (if any), but I'd like to briefly talk about the problem.
RFC 7541 (the HPACK specification) provides support for what it calls "never indexed" header fields (§ 6.2.3). These fields have certain restrictions, which exist to serve one specific goal:
This representation is intended for protecting header field values that are not to be put at risk by compressing them.
The core reasoning is discussed at length in RFC 7541 § 7.1, but can be summarised as follows. It is possible for attackers to mount attacks similar to the CRIME attack against the HPACK compression algorithm state. Put another way, if the attacker is capable of getting any entity that emits privacy-sensitive headers to emit headers of their own construction, they are potentially able to use the size of the responses to probe the compression state of the endpoint. That can expose users to the risk of having their credentials stolen: obviously very bad.
RFC 7541 points out that
Attacks of this nature are possible any time that two mutually distrustful entities control requests or responses that are placed onto a single HTTP/2 connection.
The cases that worry me here are:
- Servers or clients that allow users to inject headers without validation.
- Intermediaries that coalesce connections in any way.
Happily, RFC 7541's "never indexed" literals exist to solve this problem. These header fields are sent in their literal form with one extra caveat: intermediaries MUST NOT translate them to any other form. That means that they never get added to the compression context of any HTTP/2 box in the network.
The purpose of this thread is to work out what hyper-h2 should do about this. The Python HPACK library has support for emitting headers in this form (since 1.1.0), and handles receiving them appropriately.
I have two questions:
- How do we handle servers/clients needing to keep fields out of compression contexts? Do we give them explicit APIs to do it, or do we do it by default for specific fields (e.g. Authorization, Cookie)? Do we do both? If we give them explicit APIs, how should that API look?
- What about middleboxes (e.g. mitmproxy)? Right now they aren't told about headers that are emitted with never indexed semantics, which means they aren't able to meet the requirements of RFC 7541. That clearly has to change.
I'd like to solicit answers to those questions from some people. I'm explicitly tagging the Hyper devs (@python-hyper/core), the mitmproxy devs (@Kriechi, @mhils), and some other people who care about this sort of thing (@jimcarreer, @bagder, @tatsuhiro-t) to get your ideas about this.