Skip to content

Add some simple syntax to match against nested pairs of regexes #761

@IsaacOscar

Description

@IsaacOscar

Bassically, a common thing I often do is want to match against nested groups of brackets, e.g. given the string hello ( a ( b ) c ) wo ( rl ) d (
I want the matches

( a ( b ) c ) 

And

( rl )

but not ( as that doesn't have a matching closing bracket.

now pcre2 is capable of doing that,
if we let BEGIN be the start regex (in this case \(), and END be the end regex (i.e. \)), then you can do that with the regex:

BEGIN(?J)(?<name>(?:(?!BEGIN|END).|BEGIN(?&name)END)*)END

(In the above name can be any name that is not a 'free variable' in BEGIN and END, i.e. the regex can be nested with the same name)
The above is quite long and verbose and took me a while to work out (and I've since forgotten how it works), so it would be nice to have a syntax for this.

For example the syntax could be:

(?/BEGIN/END)

So to match round brackets (?/\(/\)). To match round or curley brackets (e.g. { a ( b } )), you could write use (?/[(}]/[(}]).

Another example which I frequently use when processing LaTeX files is if you want to match against curly brackets but ignore those preceded by a backslash:

(?/(?<!\\)[{]/(?<!\\)[}])

The entire string { helo \{ world } will match.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions