-
-
Notifications
You must be signed in to change notification settings - Fork 32k
bpo-34866: Adding max_num_fields to cgi.FieldStorage #9660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-34866: Adding max_num_fields to cgi.FieldStorage #9660
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
a8541be
to
d84aef3
Compare
d84aef3
to
d846e2b
Compare
self.list.append(MiniFieldStorage(key, value)) | ||
encoding=self.encoding, errors=self.errors, | ||
max_num_fields=self.max_num_fields) | ||
self.list = [MiniFieldStorage(key, value) for key, value in query] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an essential change, but the list comprehension seemed faster and cleaner than the append()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice improvement. Good catch!
self.list.append(MiniFieldStorage(key, value)) | ||
encoding=self.encoding, errors=self.errors, | ||
max_num_fields=self.max_num_fields) | ||
self.list.extend(MiniFieldStorage(key, value) for key, value in query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an essential change, but the extend with generator seemed faster and cleaner than the append()
.
Lib/urllib/parse.py
Outdated
# is less than max_num_fields. This prevents a memory exhaustion DOS | ||
# attack via post bodies with many fields. | ||
if max_num_fields is not None: | ||
for num_fields, _ in enumerate(_QS_DELIMITER_RE.finditer(qs), 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this with finditer()
instead of count()
so the worst case time is bounded by max_num_fields
instead of len(qs)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel qs.count('&') + qs.count(';')
is much simpler and faster.
Is this complexity really needed?
When qs is 'a'*1000 + ';&'
, finditer()'s time is bounded by len(qs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@methane I see what you're talking about, for that case you pasted it is indeed bounded by len(qs)
. Some coworkers and I had a discussion about this earlier. qs.count('&') + qs.count(';')
is about 5x faster than finditer()
if it has to do the whole string. Here are the numbers we measured. But the issue I had in mind when I chose finditer()
was bpo issue 34866 (see the example.py there specifically) which is about 50x faster with finditer()
in the measurements gist. It pretty much comes down do which scenario you want to protect against, and I think either method is an improvement. If you look over that and still think the count()
method is better then let me know and I'll change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@methane Let me know what you think of that explanation and I'll make the switch if you want, whichever way is okay with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Receiving 9MB query string is slow already. I don't think "500x slower than finditer" is not a problem in such case.
There are other methods:
pairs = [s2 for s1 in qs.split('&', max_num_fields) for s2 in s1.split(';', max_num_fields)]
if len(pairs) > max_num_fields: # len(pairs) may be `(max_num_fields+1)*2`. But since it's not cgi.FieldStorage, it's not a problem.
raise ValueError('Max number of fields exceeded')
Or you can use the regex to make pairs too:
pairs = _QS_DELIMITER_RE.split(qs, max_num_fields)
if len(pairs) > max_num_fields:
raise ValueError('Max number of fields exceeded')
I think these are simpler and faster than enumerate
hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, thanks for taking a look @methane . I changed that to use count()
which is simplest in commit 1fa59e4. If that looks good can you add the backport labels so that the miss-islington bot automatically backports this? I expect the automatic backports to fail on 2.7 but I'll fix that manually. I intend to backport it to all maintained versions, which I believe are 3.5-3.7 and 2.7.
@edevil You might be interested in this, I noticed you wrote your own https://github.com/django/django/blob/2.1.1/django/utils/http.py#L409-L415 |
@matthewbelisle-wf Ah! This would have helped back then. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the overall approach here.
Lib/cgi.py
Outdated
@@ -638,6 +642,8 @@ def read_multi(self, environ, keep_blank_values, strict_parsing): | |||
self.encoding, self.errors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISTM that we should be propagating through the max_num_fields
limit for sub-parts (and maybe reducing by the aggregate number of fields read so far).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thought, I see what you mean there. Thinking about how to implement that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, added that in 21e5e48 along with unit tests.
Lib/test/test_cgi.py
Outdated
form = cgi.FieldStorage( | ||
fp=BytesIO(data.encode('ascii')), | ||
environ=environ, | ||
max_num_fields=4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there are only 3 parts in data
, it seems as though this shouldn't raise: do we have an off-by-one somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tseaver You're right there are three parts in data
, but there are two extra parts in QUERY_STRING
. This was to test that both are considered.
As the maintainer for WebOb I welcome this change as it will help alleviate one class of attacks. |
part = klass(self.fp, headers, ib, environ, keep_blank_values, | ||
strict_parsing,self.limit-self.bytes_read, | ||
self.encoding, self.errors) | ||
self.encoding, self.errors, sub_max_num_fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tseaver I'm assuming here that if a user declares a custom FieldStorageClass
it has to accept the same params as this class. If that isn't a good assumption, aka breaking change, then let me know. So far the other parameters have had the same assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'll let @methane have his say and merge.
Thanks! ✨ 🍰 ✨
Lib/cgi.py
Outdated
@@ -351,10 +352,14 @@ def __init__(self, fp=None, headers=None, outerboundary=b'', | |||
for the page sending the form (content-type : meta http-equiv or | |||
header) | |||
|
|||
max_num_fields: Integer. If set, then __init__ throws a ValueError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you really want a type annotation (of which I see no precedent above), then just write "int".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed in ab0eb93.
self.list.append(MiniFieldStorage(key, value)) | ||
encoding=self.encoding, errors=self.errors, | ||
max_num_fields=self.max_num_fields) | ||
self.list = [MiniFieldStorage(key, value) for key, value in query] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice improvement. Good catch!
Lib/urllib/parse.py
Outdated
@@ -649,11 +649,15 @@ def parse_qs(qs, keep_blank_values=False, strict_parsing=False, | |||
encoding and errors: specify how to decode percent-encoded sequences | |||
into Unicode characters, as accepted by the bytes.decode() method. | |||
|
|||
max_num_fields: Integer. If set, then throws a ValueError if there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto about the annotation.
Not sure why the "Azure Pipelines PR" task is failing, but I don't think it is related to this pull request. |
It's failing for a lot of PRs. I have added an issue for that : https://bugs.python.org/issue34902 |
cc @zooba |
@methane Thanks for approving this. Can you add the backport labels so that the miss-islington bot automatically backports this? I expect the automatic backports to fail on 2.7 but I'll fix that manually. I intend to backport it to all maintained versions, which I believe are 3.5-3.7 and 2.7. |
@matthewbelisle-wf: Status check is done, and it's a success ✅ . |
Thanks @matthewbelisle-wf for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6, 3.7. |
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by limiting the number of `MiniFieldStorage` objects created by `FieldStorage`. (cherry picked from commit 2091448) Co-authored-by: matthewbelisle-wf <[email protected]>
GH-9965 is a backport of this pull request to the 3.7 branch. |
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by limiting the number of `MiniFieldStorage` objects created by `FieldStorage`. (cherry picked from commit 2091448) Co-authored-by: matthewbelisle-wf <[email protected]>
GH-9966 is a backport of this pull request to the 3.6 branch. |
Sorry, @matthewbelisle-wf, I could not cleanly backport this to |
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by limiting the number of `MiniFieldStorage` objects created by `FieldStorage`. (cherry picked from commit 2091448) Co-authored-by: matthewbelisle-wf <[email protected]>
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by limiting the number of `MiniFieldStorage` objects created by `FieldStorage`. (cherry picked from commit 2091448) Co-authored-by: matthewbelisle-wf <[email protected]>
GH-9969 is a backport of this pull request to the 2.7 branch. |
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by limiting the number of `MiniFieldStorage` objects created by `FieldStorage`. (cherry picked from commit 2091448)
Adding
max_num_fields
tocgi.FieldStorage
to make DOS attacks harder bylimiting the number of
MiniFieldStorage
objects created byFieldStorage
.