Skip to content

bpo-34866: Adding max_num_fields to cgi.FieldStorage #9660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 19, 2018

Conversation

matthewbelisle-wf
Copy link
Contributor

@matthewbelisle-wf matthewbelisle-wf commented Oct 1, 2018

Adding max_num_fields to cgi.FieldStorage to make DOS attacks harder by
limiting the number of MiniFieldStorage objects created by FieldStorage.

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@matthewbelisle-wf matthewbelisle-wf changed the title 34866: Adding max_num_fields to cgi.FieldStorage bpo-34866: Adding max_num_fields to cgi.FieldStorage Oct 1, 2018
@matthewbelisle-wf matthewbelisle-wf force-pushed the cgi-max-num-fields branch 2 times, most recently from a8541be to d84aef3 Compare October 3, 2018 16:08
self.list.append(MiniFieldStorage(key, value))
encoding=self.encoding, errors=self.errors,
max_num_fields=self.max_num_fields)
self.list = [MiniFieldStorage(key, value) for key, value in query]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an essential change, but the list comprehension seemed faster and cleaner than the append().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement. Good catch!

self.list.append(MiniFieldStorage(key, value))
encoding=self.encoding, errors=self.errors,
max_num_fields=self.max_num_fields)
self.list.extend(MiniFieldStorage(key, value) for key, value in query)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an essential change, but the extend with generator seemed faster and cleaner than the append().

# is less than max_num_fields. This prevents a memory exhaustion DOS
# attack via post bodies with many fields.
if max_num_fields is not None:
for num_fields, _ in enumerate(_QS_DELIMITER_RE.finditer(qs), 2):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this with finditer() instead of count() so the worst case time is bounded by max_num_fields instead of len(qs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel qs.count('&') + qs.count(';') is much simpler and faster.
Is this complexity really needed?

When qs is 'a'*1000 + ';&', finditer()'s time is bounded by len(qs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@methane I see what you're talking about, for that case you pasted it is indeed bounded by len(qs). Some coworkers and I had a discussion about this earlier. qs.count('&') + qs.count(';') is about 5x faster than finditer() if it has to do the whole string. Here are the numbers we measured. But the issue I had in mind when I chose finditer() was bpo issue 34866 (see the example.py there specifically) which is about 50x faster with finditer() in the measurements gist. It pretty much comes down do which scenario you want to protect against, and I think either method is an improvement. If you look over that and still think the count() method is better then let me know and I'll change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@methane Let me know what you think of that explanation and I'll make the switch if you want, whichever way is okay with me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Receiving 9MB query string is slow already. I don't think "500x slower than finditer" is not a problem in such case.

There are other methods:

pairs = [s2 for s1 in qs.split('&', max_num_fields) for s2 in s1.split(';', max_num_fields)]
if len(pairs) > max_num_fields:  # len(pairs) may be `(max_num_fields+1)*2`.  But since it's not cgi.FieldStorage, it's not a problem.
    raise ValueError('Max number of fields exceeded')

Or you can use the regex to make pairs too:

pairs = _QS_DELIMITER_RE.split(qs, max_num_fields)
if len(pairs) > max_num_fields:
    raise ValueError('Max number of fields exceeded')

I think these are simpler and faster than enumerate hack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thanks for taking a look @methane . I changed that to use count() which is simplest in commit 1fa59e4. If that looks good can you add the backport labels so that the miss-islington bot automatically backports this? I expect the automatic backports to fail on 2.7 but I'll fix that manually. I intend to backport it to all maintained versions, which I believe are 3.5-3.7 and 2.7.

@matthewbelisle-wf
Copy link
Contributor Author

matthewbelisle-wf commented Oct 3, 2018

@edevil You might be interested in this, I noticed you wrote your own limited_parse_qsl() for django.

https://github.com/django/django/blob/2.1.1/django/utils/http.py#L409-L415

@edevil
Copy link

edevil commented Oct 3, 2018

@matthewbelisle-wf Ah! This would have helped back then. :)

Copy link

@tseaver tseaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overall approach here.

Lib/cgi.py Outdated
@@ -638,6 +642,8 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
self.encoding, self.errors)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISTM that we should be propagating through the max_num_fields limit for sub-parts (and maybe reducing by the aggregate number of fields read so far).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thought, I see what you mean there. Thinking about how to implement that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, added that in 21e5e48 along with unit tests.

form = cgi.FieldStorage(
fp=BytesIO(data.encode('ascii')),
environ=environ,
max_num_fields=4,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that there are only 3 parts in data, it seems as though this shouldn't raise: do we have an off-by-one somewhere?

Copy link
Contributor Author

@matthewbelisle-wf matthewbelisle-wf Oct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tseaver You're right there are three parts in data, but there are two extra parts in QUERY_STRING. This was to test that both are considered.

@digitalresistor
Copy link

As the maintainer for WebOb I welcome this change as it will help alleviate one class of attacks.

part = klass(self.fp, headers, ib, environ, keep_blank_values,
strict_parsing,self.limit-self.bytes_read,
self.encoding, self.errors)
self.encoding, self.errors, sub_max_num_fields)
Copy link
Contributor Author

@matthewbelisle-wf matthewbelisle-wf Oct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tseaver I'm assuming here that if a user declares a custom FieldStorageClass it has to accept the same params as this class. If that isn't a good assumption, aka breaking change, then let me know. So far the other parameters have had the same assumption.

@alex
Copy link
Member

alex commented Oct 5, 2018

@methane and @ambv, looks like y'all were some of the most recent folks to touch these files, would you mind taking a look? Thanks!

Copy link
Contributor

@ambv ambv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll let @methane have his say and merge.

Thanks! ✨ 🍰 ✨

Lib/cgi.py Outdated
@@ -351,10 +352,14 @@ def __init__(self, fp=None, headers=None, outerboundary=b'',
for the page sending the form (content-type : meta http-equiv or
header)

max_num_fields: Integer. If set, then __init__ throws a ValueError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really want a type annotation (of which I see no precedent above), then just write "int".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in ab0eb93.

self.list.append(MiniFieldStorage(key, value))
encoding=self.encoding, errors=self.errors,
max_num_fields=self.max_num_fields)
self.list = [MiniFieldStorage(key, value) for key, value in query]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement. Good catch!

@@ -649,11 +649,15 @@ def parse_qs(qs, keep_blank_values=False, strict_parsing=False,
encoding and errors: specify how to decode percent-encoded sequences
into Unicode characters, as accepted by the bytes.decode() method.

max_num_fields: Integer. If set, then throws a ValueError if there
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about the annotation.

@matthewbelisle-wf
Copy link
Contributor Author

Not sure why the "Azure Pipelines PR" task is failing, but I don't think it is related to this pull request.

@tirkarthi
Copy link
Member

It's failing for a lot of PRs. I have added an issue for that : https://bugs.python.org/issue34902

@ambv
Copy link
Contributor

ambv commented Oct 7, 2018

The failure indeed looks unrelated:

screenshot 2018-10-07 11 03 53

@ambv
Copy link
Contributor

ambv commented Oct 7, 2018

cc @zooba

@matthewbelisle-wf
Copy link
Contributor Author

matthewbelisle-wf commented Oct 9, 2018

@methane Thanks for approving this. Can you add the backport labels so that the miss-islington bot automatically backports this? I expect the automatic backports to fail on 2.7 but I'll fix that manually. I intend to backport it to all maintained versions, which I believe are 3.5-3.7 and 2.7.

@miss-islington
Copy link
Contributor

@matthewbelisle-wf: Status check is done, and it's a success ✅ .

@miss-islington miss-islington merged commit 2091448 into python:master Oct 19, 2018
@miss-islington
Copy link
Contributor

Thanks @matthewbelisle-wf for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6, 3.7.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 19, 2018
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
(cherry picked from commit 2091448)

Co-authored-by: matthewbelisle-wf <[email protected]>
@bedevere-bot
Copy link

GH-9965 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 19, 2018
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
(cherry picked from commit 2091448)

Co-authored-by: matthewbelisle-wf <[email protected]>
@bedevere-bot
Copy link

GH-9966 is a backport of this pull request to the 3.6 branch.

@miss-islington
Copy link
Contributor

Sorry, @matthewbelisle-wf, I could not cleanly backport this to 2.7 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 209144831b0a19715bda3bd72b14a3e6192d9cc1 2.7

@miss-islington miss-islington self-assigned this Oct 19, 2018
miss-islington added a commit that referenced this pull request Oct 19, 2018
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
(cherry picked from commit 2091448)

Co-authored-by: matthewbelisle-wf <[email protected]>
miss-islington added a commit that referenced this pull request Oct 19, 2018
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
(cherry picked from commit 2091448)

Co-authored-by: matthewbelisle-wf <[email protected]>
@bedevere-bot
Copy link

GH-9969 is a backport of this pull request to the 2.7 branch.

matthewbelisle-wf added a commit to matthewbelisle-wf/cpython that referenced this pull request Oct 19, 2018
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.

(cherry picked from commit 2091448)
vstinner pushed a commit that referenced this pull request Oct 30, 2018
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.

(cherry picked from commit 2091448)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-security A security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.