You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This patch adds a onepass matcher, which is a DFA that
has all the abilities of an NFA! There are lots
of expressions that a onepass matcher can't handle, namely
those cases where a regex contains non-determinism.
The general approach we take is as follows:
1. Check if a regex is onepass using `src/onepass.rs::is_onepass`.
2. Compile a new regex program using the compiler with the bytes
flag set.
3. Compile a onepass DFA from the program produced in step 2. We
will roughly map each instruction to a state in the DFA, though
instructions like `split` don't get states.
a. Make a new transition table for the first instruction.
b. For each child of the first instruction:
- If it is a bytes instruction, add a transition to
the table for every byte class in the instruction.
- If it is an instruction which consumes zero input
(like `EmptyLook` or `Save`), emit a job to a DAG asking to
forward the first instruction state to the state for
the non-consuming instruction.
- Push the child instruction to a queue of instructions to
process.
c. Peel off an instruction from the queue and go back to
step a, processing the instruction as if it was the
first instruction. If the queue is empty, continue with
step d.
d. Topologically sort the forwarding jobs, and shuffle
the transitions from the forwarding targets to the
forwarding sources in topological order.
e. Bake the intermediary transition tables down into a single
flat vector. States which require some action (`EmptyLook`
and `Save`) get an extra entry in the baked transition table
that contains metadata instructing them on how to perform
their actions.
4. Wait for the user to give us some input.
5. Execute the DFA:
- The inner loop is basically:
while at < text.len():
state_ptr = baked_table[text[at]]
at += 1
- There is a lot of window dressing to handle special states.
The idea of a onepass matcher comes from Russ Cox and
his RE2 library. I haven't been as good about reading
the RE2 source as I should have, but I've gotten the
impression that the RE2 onepass matcher is more in the
spirit of an NFA simulation without threads than a DFA.
0 commit comments