Skip to content

[Discussion] What should we do about HPACK never-indexed header fields? #194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Lukasa opened this issue Apr 6, 2016 · 14 comments · Fixed by #203
Closed

[Discussion] What should we do about HPACK never-indexed header fields? #194

Lukasa opened this issue Apr 6, 2016 · 14 comments · Fixed by #203

Comments

@Lukasa
Copy link
Member

Lukasa commented Apr 6, 2016

This is a complex issue and I may fork it out into multiple places once we've identified work items (if any), but I'd like to briefly talk about the problem.

RFC 7541 (the HPACK specification) provides support for what it calls "never indexed" header fields (§ 6.2.3). These fields have certain restrictions, which exist to serve one specific goal:

This representation is intended for protecting header field values that are not to be put at risk by compressing them.

The core reasoning is discussed at length in RFC 7541 § 7.1, but can be summarised as follows. It is possible for attackers to mount attacks similar to the CRIME attack against the HPACK compression algorithm state. Put another way, if the attacker is capable of getting any entity that emits privacy-sensitive headers to emit headers of their own construction, they are potentially able to use the size of the responses to probe the compression state of the endpoint. That can expose users to the risk of having their credentials stolen: obviously very bad.

RFC 7541 points out that

Attacks of this nature are possible any time that two mutually distrustful entities control requests or responses that are placed onto a single HTTP/2 connection.

The cases that worry me here are:

  • Servers or clients that allow users to inject headers without validation.
  • Intermediaries that coalesce connections in any way.

Happily, RFC 7541's "never indexed" literals exist to solve this problem. These header fields are sent in their literal form with one extra caveat: intermediaries MUST NOT translate them to any other form. That means that they never get added to the compression context of any HTTP/2 box in the network.

The purpose of this thread is to work out what hyper-h2 should do about this. The Python HPACK library has support for emitting headers in this form (since 1.1.0), and handles receiving them appropriately.

I have two questions:

  1. How do we handle servers/clients needing to keep fields out of compression contexts? Do we give them explicit APIs to do it, or do we do it by default for specific fields (e.g. Authorization, Cookie)? Do we do both? If we give them explicit APIs, how should that API look?
  2. What about middleboxes (e.g. mitmproxy)? Right now they aren't told about headers that are emitted with never indexed semantics, which means they aren't able to meet the requirements of RFC 7541. That clearly has to change.

I'd like to solicit answers to those questions from some people. I'm explicitly tagging the Hyper devs (@python-hyper/core), the mitmproxy devs (@Kriechi, @mhils), and some other people who care about this sort of thing (@jimcarreer, @bagder, @tatsuhiro-t) to get your ideas about this.

@jimcarreer
Copy link

So the problem here is that its really up to the developers to decide which headers should not be compressed for security reasons. While it is sane to assume that by default (with maybe a simple list/set that can be set by the hyper / hpack user) we should not compress Authorization or Cookie headers, really any header containing session identifying information could be vulnerable to the attack. I've seen many a REST API, for example, using custom X- headers for storing session and authentication data. There would be no "catch all" for Hyper / HPACK as far as I can see, and since it's a library, providing that might be out of scope anyway.

@tatsuhiro-t
Copy link

Speaking as nghttp2 maintainer, nghttp2 uses never-indexed for Authorization header fields, and Cookie header fields whose value is less than 20 bytes. The 20 bytes cookie comes from Firefox codebase. Since we'd like to avoid exact match, we care about short entropy cookie here. Authorization is typically short, so they are encoded in never indexed always.
nghttp2 offers the API that application can choose never indexed representation for particular header fields.

@Lukasa
Copy link
Member Author

Lukasa commented Apr 6, 2016

Thanks @tatsuhiro-t, that's extremely valuable feedback. It's also almost exactly what I was considering doing (except for the length limitation on cookies, which is a good idea).

@jimcarreer
Copy link

This is just my 2 cents, I am a bit hesitant to support the 20 byte limit: pulling off a successful CRIME attack really comes down to how many requests can be made against the system that compresses the sensitive header. It is definitely a "brute force" guessing attack, but it is made easier by the fact that you should be able to make many simultaneous requests against the compressing host given the resources (like a large bot network). At the very very least I'd say it should be an additional option that can be changed with an API call (and in general maybe we want to extend the ability to have a maximum byte length for any header set by the user to never be indexed). I hope that makes sense?

@mhils
Copy link
Member

mhils commented Apr 6, 2016

For mitmproxy (or middleboxes on general), if we would like to follow the
spec religiously, we'd need to get the metadata from the sending connection
which headers are never-indexed in order to feed that in the other
connection. Ideally, we would just take something hyper-h2 produces (just
the list of never-indexed headers?) and feed that as-is into another
connection instance when sending headers.

On Wed, 6 Apr 2016 07:03 Cory Benfield, [email protected] wrote:

Thanks @tatsuhiro-t https://github.com/tatsuhiro-t, that's extremely
valuable feedback. It's also almost exactly what I was considering doing
(except for the length limitation on cookies, which is a good idea).


You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub
#194 (comment)

@jimcarreer
Copy link

@mhils @Lukasa Maybe what HPACK and eventually Hyper emits in terms of headers could also include an element specifying that it is not to be indexed? IIRC the hpack decoder takes care of determining which headers from the sender were not indexed but doesn't actually make that information available in any meaningful way. @Lukasa do you think it is something the table object could / should track?

@Lukasa
Copy link
Member Author

Lukasa commented Apr 6, 2016

do you think it is something the table object could / should track?

The table object shouldn't track it: that's part of the purpose of the never indexed headers, they shouldn't reside in the compression tables in any way.

I'm beginning to wonder if we should transition to a "richer" representation of headers when generating them from HPACK, and then when working with them in hyper-h2. For example, we could use namedtuples that contain a value indicating whether they're indexed and emit those from HPACK and then work with them inside hyper-h2. That would allow us to attach richer metadata to each header (for now just the never-index flag) and provide the user with objects they can hook into to flag that appropriately.

The way I see it, we can go a few different ways with this:

  1. Use named three-tuples (namedtuple('HeaderField', ['name', 'value', 'never_index'])) to emit all headers from HPACK. This breaks anyone expecting to be able to tuple-unpack the headers coming out of HPACK (which probably includes HPACK's test suites and possibly even HPACK itself!), but has the advantage of being isomorphic to the input HPACK takes.
  2. At the HPACK level, alternate between two and three tuples (named or not) depending on whether the header field is never indexed. That's truly isomorphic to the input HPACK takes but is also clearly stupid, so we'll immediately rule it out.
  3. Use namedtuples that are always two-tuples and differ only by type to signal indexed/non-indexed status. E.g. namedtuple('HeaderField', ['name', 'value']) & namedtuple('HeaderFieldNeverIndex', ['name', 'value']). That allows isinstance checking to determine the correct value. These would probably get passed through to hyper-h2 directly, which would emit them itself. hyper-h2 would also accept these types (just safely because they're tuples).
  4. Use a tuple subclass that is not a namedtuple (it'd be hand-rolled to be as efficient as namedtuple though) that carries a flag on it indicating whether the header field is to be never indexed, but that does not include that header field in the tuple directly. That would meant that tuple unpacking still works, but that the flag can be accessed directly rather than having to do isinstance checks to get everything to work. hyper-h2 will also just pass these up in the place that the headers currently go, and will accept them internally.

Of the set of ideas I think I prefer 4, even though it requires the most work, because it enables us to keep most of the current behaviour the same and provides the nicest API for checking whether headers are safe to index. It also makes mitmproxy's job easier (it can pass the entire header block back into hyper-h2 and just automatically get the right behaviour), and allows us to provide a declarative API to consumers that want to mark headers as never indexed ("instantiate this special kind of tuple!").

Thoughts?

@jimcarreer
Copy link

@Lukasa If I understand correctly, would we need to rework the interface for the encoder/decoder, or just the emissions of the decoder (HPACK) for no 4 which I agree sounds like the better solution.

@Lukasa
Copy link
Member Author

Lukasa commented Apr 7, 2016

The encoder would need to be reworked too, but it's fairly minor: we can enhance the encoder to replace three-tuples with these special tuples. Shouldn't take long at all. =)

@Lukasa
Copy link
Member Author

Lukasa commented Apr 19, 2016

So, the tuples would be implemented roughly like this:

class HeaderTuple(tuple):
    __slots__ = ()

    indexable = True

    def __new__(_cls, *args):
        return tuple.__new__(_cls, args)


class NeverIndexedHeaderTuple(HeaderTuple):
    __slots__ = ()

    indexable = False

This has a nice side-effect: because indexable isn't mutable (as no tuple fields are mutable), it becomes very difficult to accidentally break these header tuples. They also unpack exactly as one would expect, and altogether just behave properly.

@sigmavirus24
Copy link
Contributor

Do you really need to define __new__ for the HeaderTuple?

@Lukasa
Copy link
Member Author

Lukasa commented Apr 19, 2016

@sigmavirus24 To get it to behave like a tuple you do. The tuple constructor doesn't actually take multiple arguments, it takes a single iterable. That feels really awkward when what you want to write is HeaderTuple(name, value).

@Lukasa
Copy link
Member Author

Lukasa commented Apr 20, 2016

Ok, hpack v2.2.0 has just been shipped with the brand new tuple classes. We can plumb support through hyper-h2 now.

Lukasa added a commit that referenced this issue Apr 21, 2016
@Lukasa Lukasa reopened this Apr 21, 2016
@Lukasa
Copy link
Member Author

Lukasa commented Apr 21, 2016

This isn't done yet, silly GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants