Skip to content

Commit 91371de

Browse files
author
Ethan Pailes
committed
Add function to determine if a regex is onepass.
This patch adds the analysis function, `is_onepass` found in `analysis.rs`, which is required in order to determine if a particular regex can be executed using the onepass DFA. A regex is said to be onepass iff there are no non-deterministic splits in it. An example of a non-determinism in a regex is `/alex|apple/`. Here we can't know which branch to take because both of them start with `a`. A more subtle example is `/(?:alex)*apple/`. After every iteration of the Kleene star, we might branch back to `alex` or continue on to `apple`.
1 parent 571d5a6 commit 91371de

File tree

4 files changed

+573
-0
lines changed

4 files changed

+573
-0
lines changed

regex-syntax/src/hir/interval.rs

+5
Original file line numberDiff line numberDiff line change
@@ -309,6 +309,11 @@ impl<I: Interval> IntervalSet<I> {
309309
}
310310
true
311311
}
312+
313+
/// Returns true iff this class is empty.
314+
pub fn is_empty(&self) -> bool {
315+
self.ranges.is_empty()
316+
}
312317
}
313318

314319
/// An iterator over intervals.

regex-syntax/src/hir/mod.rs

+20
Original file line numberDiff line numberDiff line change
@@ -797,6 +797,16 @@ impl ClassUnicode {
797797
pub fn symmetric_difference(&mut self, other: &ClassUnicode) {
798798
self.set.symmetric_difference(&other.set);
799799
}
800+
801+
/// Returns true iff this character class contains no characters.
802+
///
803+
/// This should never be true for a character class which was
804+
/// constructed by the regex parser, but a notion of character
805+
/// class emptiness can be useful for code that wants to
806+
/// programmatically generate character classes.
807+
pub fn is_empty(&self) -> bool {
808+
self.set.is_empty()
809+
}
800810
}
801811

802812
/// An iterator over all ranges in a Unicode character class.
@@ -998,6 +1008,16 @@ impl ClassBytes {
9981008
pub fn is_all_ascii(&self) -> bool {
9991009
self.set.intervals().last().map_or(true, |r| r.end <= 0x7F)
10001010
}
1011+
1012+
/// Returns true iff this character class contains no characters.
1013+
///
1014+
/// This should never be true for a character class which was
1015+
/// constructed by the regex parser, but a notion of character
1016+
/// class emptiness can be useful for code that wants to
1017+
/// programmatically generate character classes.
1018+
pub fn is_empty(&self) -> bool {
1019+
self.set.is_empty()
1020+
}
10011021
}
10021022

10031023
/// An iterator over all ranges in a byte character class.

0 commit comments

Comments
 (0)