Skip to content

Discussion of the new XML processing feature #3178

Closed
@airween

Description

@airween

Describe the bug

It's not a bug but a discussion about a new feature, how can we extend the XML processing.

There is a feature request from a customer that we should extend the engines' XML parsing capability. Of course, we should add this request to both engine with same behavior.

Current behavior

Consider this payload:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <level1>
    <level2>
      <node>foo1</node>
      <node>bar1</node>
    </level2>
    <level2>
      <node>foo2</node>
      <node>bar2</node>
    </level2>
  </level1>
</root>

This payload will appear in current state in the engines:

(mod_security2)

[/post][9] Target value: "  foo1  bar1  foo2  bar2"

(libmodsecurity3)

[/post] [9] Target value: "  foo1  bar1  foo2  bar2" (Variable: XML:/*)

(lines from debug.logs)

Problem

The problem is that exclusions for sub-parts and specific nodes does not work. See the example:

SecRule XML:/* "@rx ^foo.*" \
	"id:10001,\
	phase:2,\
	t:none,\
	log,\
	pass,\
	ctl:ruleRemoveTargetById=930120;XML:/level1/level2/node"

because the XML variable holds the concatenated node values, not a key:value pairs like JSON. Therefore it's impossible to create any exclusion against any rules.

Possible solution

Consider this converted strcture (XML to JSON):

{
  "root": {
    "level1": {
      "level2": [
        {
          "node": [
            "foo1",
            "bar1"
          ]
        },
        {
          "node": [
            "foo2",
            "bar2"
          ]
        }
      ]
    }
  }
}

This payload will expanded like this:

(mod_security2)

[/post][9] Adding JSON argument 'root.level1.level2.level2.node.node' with value 'foo1'
[/post][9] Adding JSON argument 'root.level1.level2.level2.node.node' with value 'bar1'
[/post][9] Adding JSON argument 'root.level1.level2.level2.node.node' with value 'foo2'
[/post][9] Adding JSON argument 'root.level1.level2.level2.node.node' with value 'bar2'

(libmodsecurity3)

[/post] [4] Adding request argument (JSON): name "json.root.level1.level2.array_0.node.array_0", value "foo1"
[/post] [4] Adding request argument (JSON): name "json.root.level1.level2.array_0.node.array_1", value "bar1"
[/post] [4] Adding request argument (JSON): name "json.root.level1.level2.array_1.node.array_0", value "foo2"
[/post] [4] Adding request argument (JSON): name "json.root.level1.level2.array_1.node.array_1", value "bar2"

The idea is to transform the XML structure in a similar way.

Example:

(libmodsecurity3)

[/post] [4] Adding request argument (XML): name "xml.root.level1.level2.array_0.node.array_0", value "foo1"
[/post] [4] Adding request argument (XML): name "xml.root.level1.level2.array_0.node.array_1", value "bar1"
[/post] [4] Adding request argument (XML): name "xml.root.level1.level2.array_1.node.array_0", value "foo2"
[/post] [4] Adding request argument (XML): name "xml.root.level1.level2.array_1.node.array_1", value "bar2"

Possible risks

  • if we introduce this "new" collection under an existing one, then it will causes false positive matches
  • cost of parsing an XML structure is very high

How can we avoid/handle the risks?

We can put the decision in the hands of the user, whether he wants to see the new collection under the ARGS or not - so introduce a new configuration keyword, eg. SecParseXMLintoArgs (consider the optional runtime config, eg. ctl:parseXMLintoArgs)

As in case of JSON, introduce a new configuration keyword which controls the maximum number of XML levels that can be analyzed, eg. SecRequestBodyXMLDepthLimit (see SecRequestBodyJSONDepthLimit)

More todo's

We have to:

  • analyze XML parser performance
    • should we change from libxml2 to another parser? Libexpat? Or other?
  • check the effect of SecArgumentsLimit in case of JSON parsing
  • design and apply this behavior on XML parsing
  • explore the possibility of additional XML validation methods (eg. XXE (XML External Entity) detection)
  • to decide the issue of compatibility or uniform behavior within versions

For the last item: the behavior of JSON parsing in two versions are different. Consider the payload {"a":1,"b":[{"a1":"a1val"},{"a1":"a2val"}]} (see that there is a list!) which is equivalent with this XML:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <a>1</a>
    <b>
        <element>
            <a1>a1val</a1>
        </element>
        <element>
            <a1>a2val</a1>
        </element>
    </b>
</root>

which produces these results:

(mod_security2)

[/post][9] Adding JSON argument 'a' with value '1'
[/post][9] Adding JSON argument 'b.b.a1' with value 'a1val'
[/post][9] Adding JSON argument 'b.b.a1' with value 'a2val'

(libmodsecurity3)

[/post] [4] Adding request argument (JSON): name "json.a", value "1"
[/post] [4] Adding request argument (JSON): name "json.b.array_0.a1", value "a1val"
[/post] [4] Adding request argument (JSON): name "json.b.array_1.a1", value "a2val"

Note, that please check the list items with the same keys! I think we should follow the libmodsecurity3's behavior - but the the XML and JSON won't be compatible. (Which implies the next question: do we want to align the mod_security2's behavior?)

Any feedback are welcome!

Metadata

Metadata

Assignees

No one assigned

    Labels

    2.xRelated to ModSecurity version 2.x

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions