Description
Describe the bug
It's not a bug but a discussion about a new feature, how can we extend the XML processing.
There is a feature request from a customer that we should extend the engines' XML parsing capability. Of course, we should add this request to both engine with same behavior.
Current behavior
Consider this payload:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<level1>
<level2>
<node>foo1</node>
<node>bar1</node>
</level2>
<level2>
<node>foo2</node>
<node>bar2</node>
</level2>
</level1>
</root>
This payload will appear in current state in the engines:
(mod_security2)
[/post][9] Target value: " foo1 bar1 foo2 bar2"
(libmodsecurity3)
[/post] [9] Target value: " foo1 bar1 foo2 bar2" (Variable: XML:/*)
(lines from debug.logs)
Problem
The problem is that exclusions for sub-parts and specific nodes does not work. See the example:
SecRule XML:/* "@rx ^foo.*" \
"id:10001,\
phase:2,\
t:none,\
log,\
pass,\
ctl:ruleRemoveTargetById=930120;XML:/level1/level2/node"
because the XML variable holds the concatenated node values, not a key:value pairs like JSON. Therefore it's impossible to create any exclusion against any rules.
Possible solution
Consider this converted strcture (XML to JSON):
{
"root": {
"level1": {
"level2": [
{
"node": [
"foo1",
"bar1"
]
},
{
"node": [
"foo2",
"bar2"
]
}
]
}
}
}
This payload will expanded like this:
(mod_security2)
[/post][9] Adding JSON argument 'root.level1.level2.level2.node.node' with value 'foo1'
[/post][9] Adding JSON argument 'root.level1.level2.level2.node.node' with value 'bar1'
[/post][9] Adding JSON argument 'root.level1.level2.level2.node.node' with value 'foo2'
[/post][9] Adding JSON argument 'root.level1.level2.level2.node.node' with value 'bar2'
(libmodsecurity3)
[/post] [4] Adding request argument (JSON): name "json.root.level1.level2.array_0.node.array_0", value "foo1"
[/post] [4] Adding request argument (JSON): name "json.root.level1.level2.array_0.node.array_1", value "bar1"
[/post] [4] Adding request argument (JSON): name "json.root.level1.level2.array_1.node.array_0", value "foo2"
[/post] [4] Adding request argument (JSON): name "json.root.level1.level2.array_1.node.array_1", value "bar2"
The idea is to transform the XML structure in a similar way.
Example:
(libmodsecurity3)
[/post] [4] Adding request argument (XML): name "xml.root.level1.level2.array_0.node.array_0", value "foo1"
[/post] [4] Adding request argument (XML): name "xml.root.level1.level2.array_0.node.array_1", value "bar1"
[/post] [4] Adding request argument (XML): name "xml.root.level1.level2.array_1.node.array_0", value "foo2"
[/post] [4] Adding request argument (XML): name "xml.root.level1.level2.array_1.node.array_1", value "bar2"
Possible risks
- if we introduce this "new" collection under an existing one, then it will causes false positive matches
- cost of parsing an XML structure is very high
How can we avoid/handle the risks?
We can put the decision in the hands of the user, whether he wants to see the new collection under the ARGS
or not - so introduce a new configuration keyword, eg. SecParseXMLintoArgs
(consider the optional runtime config, eg. ctl:parseXMLintoArgs
)
As in case of JSON, introduce a new configuration keyword which controls the maximum number of XML levels that can be analyzed, eg. SecRequestBodyXMLDepthLimit
(see SecRequestBodyJSONDepthLimit)
More todo's
We have to:
- analyze XML parser performance
- should we change from libxml2 to another parser? Libexpat? Or other?
- check the effect of SecArgumentsLimit in case of JSON parsing
- design and apply this behavior on XML parsing
- explore the possibility of additional XML validation methods (eg. XXE (XML External Entity) detection)
- to decide the issue of compatibility or uniform behavior within versions
For the last item: the behavior of JSON parsing in two versions are different. Consider the payload {"a":1,"b":[{"a1":"a1val"},{"a1":"a2val"}]}
(see that there is a list!) which is equivalent with this XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<a>1</a>
<b>
<element>
<a1>a1val</a1>
</element>
<element>
<a1>a2val</a1>
</element>
</b>
</root>
which produces these results:
(mod_security2)
[/post][9] Adding JSON argument 'a' with value '1'
[/post][9] Adding JSON argument 'b.b.a1' with value 'a1val'
[/post][9] Adding JSON argument 'b.b.a1' with value 'a2val'
(libmodsecurity3)
[/post] [4] Adding request argument (JSON): name "json.a", value "1"
[/post] [4] Adding request argument (JSON): name "json.b.array_0.a1", value "a1val"
[/post] [4] Adding request argument (JSON): name "json.b.array_1.a1", value "a2val"
Note, that please check the list items with the same keys! I think we should follow the libmodsecurity3's behavior - but the the XML and JSON won't be compatible. (Which implies the next question: do we want to align the mod_security2's behavior?)
Any feedback are welcome!