bpo-34866: Adding max_num_fields to cgi.FieldStorage #9660

matthewbelisle-wf · 2018-10-01T21:27:27Z

Adding max_num_fields to cgi.FieldStorage to make DOS attacks harder by
limiting the number of MiniFieldStorage objects created by FieldStorage.

the-knights-who-say-ni · 2018-10-01T21:27:30Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

matthewbelisle-wf · 2018-10-03T16:37:03Z

Lib/cgi.py

-            self.list.append(MiniFieldStorage(key, value))
+            encoding=self.encoding, errors=self.errors,
+            max_num_fields=self.max_num_fields)
+        self.list = [MiniFieldStorage(key, value) for key, value in query]


Not an essential change, but the list comprehension seemed faster and cleaner than the append().

This is a nice improvement. Good catch!

matthewbelisle-wf · 2018-10-03T16:38:14Z

Lib/cgi.py

-                self.list.append(MiniFieldStorage(key, value))
+                encoding=self.encoding, errors=self.errors,
+                max_num_fields=self.max_num_fields)
+            self.list.extend(MiniFieldStorage(key, value) for key, value in query)


Not an essential change, but the extend with generator seemed faster and cleaner than the append().

matthewbelisle-wf · 2018-10-03T16:41:20Z

Lib/urllib/parse.py

+    # is less than max_num_fields. This prevents a memory exhaustion DOS
+    # attack via post bodies with many fields.
+    if max_num_fields is not None:
+        for num_fields, _ in enumerate(_QS_DELIMITER_RE.finditer(qs), 2):


Doing this with finditer() instead of count() so the worst case time is bounded by max_num_fields instead of len(qs).

I feel qs.count('&') + qs.count(';') is much simpler and faster.
Is this complexity really needed?

When qs is 'a'*1000 + ';&', finditer()'s time is bounded by len(qs)

@methane I see what you're talking about, for that case you pasted it is indeed bounded by len(qs). Some coworkers and I had a discussion about this earlier. qs.count('&') + qs.count(';') is about 5x faster than finditer() if it has to do the whole string. Here are the numbers we measured. But the issue I had in mind when I chose finditer() was bpo issue 34866 (see the example.py there specifically) which is about 50x faster with finditer() in the measurements gist. It pretty much comes down do which scenario you want to protect against, and I think either method is an improvement. If you look over that and still think the count() method is better then let me know and I'll change it.

@methane Let me know what you think of that explanation and I'll make the switch if you want, whichever way is okay with me.

Receiving 9MB query string is slow already. I don't think "500x slower than finditer" is not a problem in such case.

There are other methods:

pairs = [s2 for s1 in qs.split('&', max_num_fields) for s2 in s1.split(';', max_num_fields)] if len(pairs) > max_num_fields: # len(pairs) may be `(max_num_fields+1)*2`. But since it's not cgi.FieldStorage, it's not a problem. raise ValueError('Max number of fields exceeded')

Or you can use the regex to make pairs too:

pairs = _QS_DELIMITER_RE.split(qs, max_num_fields) if len(pairs) > max_num_fields: raise ValueError('Max number of fields exceeded')

I think these are simpler and faster than enumerate hack.

Alright, thanks for taking a look @methane . I changed that to use count() which is simplest in commit 1fa59e4. If that looks good can you add the backport labels so that the miss-islington bot automatically backports this? I expect the automatic backports to fail on 2.7 but I'll fix that manually. I intend to backport it to all maintained versions, which I believe are 3.5-3.7 and 2.7.

matthewbelisle-wf · 2018-10-03T16:55:13Z

@edevil You might be interested in this, I noticed you wrote your own limited_parse_qsl() for django.

https://github.com/django/django/blob/2.1.1/django/utils/http.py#L409-L415

edevil · 2018-10-03T17:07:26Z

@matthewbelisle-wf Ah! This would have helped back then. :)

tseaver

I like the overall approach here.

tseaver · 2018-10-03T19:09:41Z

Lib/cgi.py

@@ -638,6 +642,8 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
                         self.encoding, self.errors)


ISTM that we should be propagating through the max_num_fields limit for sub-parts (and maybe reducing by the aggregate number of fields read so far).

Good thought, I see what you mean there. Thinking about how to implement that.

Okay, added that in 21e5e48 along with unit tests.

tseaver · 2018-10-03T19:12:10Z

Lib/test/test_cgi.py

+            form = cgi.FieldStorage(
+                fp=BytesIO(data.encode('ascii')),
+                environ=environ,
+                max_num_fields=4,


Given that there are only 3 parts in data, it seems as though this shouldn't raise: do we have an off-by-one somewhere?

@tseaver You're right there are three parts in data, but there are two extra parts in QUERY_STRING. This was to test that both are considered.

digitalresistor · 2018-10-03T21:08:50Z

As the maintainer for WebOb I welcome this change as it will help alleviate one class of attacks.

matthewbelisle-wf · 2018-10-03T21:11:05Z

Lib/cgi.py

            part = klass(self.fp, headers, ib, environ, keep_blank_values,
                         strict_parsing,self.limit-self.bytes_read,
-                         self.encoding, self.errors)
+                         self.encoding, self.errors, sub_max_num_fields)


@tseaver I'm assuming here that if a user declares a custom FieldStorageClass it has to accept the same params as this class. If that isn't a good assumption, aka breaking change, then let me know. So far the other parameters have had the same assumption.

alex · 2018-10-05T15:16:43Z

@methane and @ambv, looks like y'all were some of the most recent folks to touch these files, would you mind taking a look? Thanks!

ambv

LGTM. I'll let @methane have his say and merge.

Thanks! ✨ 🍰 ✨

ambv · 2018-10-05T15:26:40Z

Lib/cgi.py

@@ -351,10 +352,14 @@ def __init__(self, fp=None, headers=None, outerboundary=b'',
            for the page sending the form (content-type : meta http-equiv or
            header)

+        max_num_fields: Integer. If set, then __init__ throws a ValueError


If you really want a type annotation (of which I see no precedent above), then just write "int".

Thanks, fixed in ab0eb93.

ambv · 2018-10-05T15:27:29Z

Lib/cgi.py

-            self.list.append(MiniFieldStorage(key, value))
+            encoding=self.encoding, errors=self.errors,
+            max_num_fields=self.max_num_fields)
+        self.list = [MiniFieldStorage(key, value) for key, value in query]


This is a nice improvement. Good catch!

ambv · 2018-10-05T15:28:37Z

Lib/urllib/parse.py

@@ -649,11 +649,15 @@ def parse_qs(qs, keep_blank_values=False, strict_parsing=False,
        encoding and errors: specify how to decode percent-encoded sequences
            into Unicode characters, as accepted by the bytes.decode() method.

+        max_num_fields: Integer. If set, then throws a ValueError if there


Ditto about the annotation.

matthewbelisle-wf · 2018-10-05T16:04:43Z

Not sure why the "Azure Pipelines PR" task is failing, but I don't think it is related to this pull request.

tirkarthi · 2018-10-05T16:07:38Z

It's failing for a lot of PRs. I have added an issue for that : https://bugs.python.org/issue34902

ambv · 2018-10-07T09:04:24Z

The failure indeed looks unrelated:

ambv · 2018-10-07T09:04:42Z

cc @zooba

matthewbelisle-wf · 2018-10-09T16:07:04Z

@methane Thanks for approving this. Can you add the backport labels so that the miss-islington bot automatically backports this? I expect the automatic backports to fail on 2.7 but I'll fix that manually. I intend to backport it to all maintained versions, which I believe are 3.5-3.7 and 2.7.

miss-islington · 2018-10-19T10:52:59Z

@matthewbelisle-wf: Status check is done, and it's a success ✅ .

miss-islington · 2018-10-19T10:53:02Z

Thanks @matthewbelisle-wf for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6, 3.7.
🐍🍒⛏🤖

Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by limiting the number of `MiniFieldStorage` objects created by `FieldStorage`. (cherry picked from commit 2091448) Co-authored-by: matthewbelisle-wf <[email protected]>

bedevere-bot · 2018-10-19T10:53:16Z

GH-9965 is a backport of this pull request to the 3.7 branch.

Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by limiting the number of `MiniFieldStorage` objects created by `FieldStorage`. (cherry picked from commit 2091448) Co-authored-by: matthewbelisle-wf <[email protected]>

bedevere-bot · 2018-10-19T10:53:24Z

GH-9966 is a backport of this pull request to the 3.6 branch.

miss-islington · 2018-10-19T10:53:26Z

Sorry, @matthewbelisle-wf, I could not cleanly backport this to 2.7 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 209144831b0a19715bda3bd72b14a3e6192d9cc1 2.7

Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by limiting the number of `MiniFieldStorage` objects created by `FieldStorage`. (cherry picked from commit 2091448) Co-authored-by: matthewbelisle-wf <[email protected]>

bedevere-bot · 2018-10-19T16:20:10Z

GH-9969 is a backport of this pull request to the 2.7 branch.

Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by limiting the number of `MiniFieldStorage` objects created by `FieldStorage`. (cherry picked from commit 2091448)

the-knights-who-say-ni added the CLA not signed label Oct 1, 2018

bedevere-bot added the awaiting review label Oct 1, 2018

matthewbelisle-wf changed the title ~~34866: Adding max_num_fields to cgi.FieldStorage~~ bpo-34866: Adding max_num_fields to cgi.FieldStorage Oct 1, 2018

matthewbelisle-wf force-pushed the cgi-max-num-fields branch 2 times, most recently from a8541be to d84aef3 Compare October 3, 2018 16:08

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Oct 3, 2018

Adding max_num_fields to cgi.FieldStorage

d846e2b

matthewbelisle-wf force-pushed the cgi-max-num-fields branch from d84aef3 to d846e2b Compare October 3, 2018 16:35

matthewbelisle-wf commented Oct 3, 2018

View reviewed changes

tseaver reviewed Oct 3, 2018

View reviewed changes

Propagating max_num_fields to FieldStorage subclass

21e5e48

matthewbelisle-wf commented Oct 3, 2018

View reviewed changes

Fixing typos

27352e9

ambv approved these changes Oct 5, 2018

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting review labels Oct 5, 2018

matthewbelisle-wf added 2 commits October 5, 2018 10:47

Fixing annotation

ab0eb93

Merge branch 'master' into cgi-max-num-fields

7668322

Using count() instead of finditer() for max_num_fields check

1fa59e4

methane added needs backport to 3.6 type-security A security issue labels Oct 19, 2018

Update 2018-10-03-11-07-28.bpo-34866.ML6KpJ.rst

20a77f2

methane added the 🤖 automerge label Oct 19, 2018

miss-islington merged commit 2091448 into python:master Oct 19, 2018

bedevere-bot removed the awaiting merge label Oct 19, 2018

bedevere-bot removed the needs backport to 3.7 label Oct 19, 2018

bedevere-bot removed the needs backport to 3.6 label Oct 19, 2018

miss-islington self-assigned this Oct 19, 2018

bedevere-bot removed the needs backport to 2.7 label Oct 19, 2018

This was referenced Oct 19, 2018

[2.7] bpo-34866: Adding max_num_fields to cgi.FieldStorage (GH-9660) #9969

Merged

bpo-35028: cgi: Fix max_num_fields off by one error #9973

Merged

		@@ -638,6 +642,8 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
		self.encoding, self.errors)

Uh oh!

bpo-34866: Adding max_num_fields to cgi.FieldStorage #9660

bpo-34866: Adding max_num_fields to cgi.FieldStorage #9660

Uh oh!

Conversation

matthewbelisle-wf commented Oct 1, 2018 • edited by methane Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

the-knights-who-say-ni commented Oct 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewbelisle-wf commented Oct 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edevil commented Oct 3, 2018

Uh oh!

tseaver left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewbelisle-wf Oct 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

digitalresistor commented Oct 3, 2018

Uh oh!

matthewbelisle-wf Oct 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex commented Oct 5, 2018

Uh oh!

ambv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewbelisle-wf commented Oct 5, 2018

Uh oh!

tirkarthi commented Oct 5, 2018

Uh oh!

ambv commented Oct 7, 2018

Uh oh!

ambv commented Oct 7, 2018

Uh oh!

matthewbelisle-wf commented Oct 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miss-islington commented Oct 19, 2018

Uh oh!

matthewbelisle-wf commented Oct 1, 2018 •

edited by methane

Loading

matthewbelisle-wf commented Oct 3, 2018 •

edited

Loading

matthewbelisle-wf Oct 3, 2018 •

edited

Loading

matthewbelisle-wf Oct 3, 2018 •

edited

Loading

matthewbelisle-wf commented Oct 9, 2018 •

edited

Loading