Skip to content

Commit 41f14c2

Browse files
committed
fuzz: account for Unicode class size in compiler
This improves the precision of the "expression too big" regex compilation error. Previously, it was not considering the heap usage from Unicode character classes. It's possible this will make some regexes fail to compile that previously compiled. However, this is a bug fix. If you do wind up seeing this though, feel free to file an issue, since it would be good to get an idea of what kinds of regexes no longer compile but did. This was found by OSS-fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33579
1 parent 6d95a6f commit 41f14c2

File tree

3 files changed

+33
-1
lines changed

3 files changed

+33
-1
lines changed

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,15 @@
1+
1.4.6 (2021-04-22)
2+
==================
3+
This is a small patch release that fixes the compiler's size check on how much
4+
heap memory a regex uses. Previously, the compiler did not account for the
5+
heap usage of Unicode character classes. Now it does. It's possible that this
6+
may make some regexes fail to compile that previously did compile. If that
7+
happens, please file an issue.
8+
9+
* [BUG OSS-fuzz#33579](https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33579):
10+
Some regexes can use more heap memory than one would expect.
11+
12+
113
1.4.5 (2021-03-14)
214
==================
315
This is a small patch release that fixes a regression in the size of a `Regex`

src/compile.rs

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ pub struct Compiler {
3838
suffix_cache: SuffixCache,
3939
utf8_seqs: Option<Utf8Sequences>,
4040
byte_classes: ByteClassSet,
41+
extra_inst_bytes: usize,
4142
}
4243

4344
impl Compiler {
@@ -54,6 +55,7 @@ impl Compiler {
5455
suffix_cache: SuffixCache::new(1000),
5556
utf8_seqs: Some(Utf8Sequences::new('\x00', '\x00')),
5657
byte_classes: ByteClassSet::new(),
58+
extra_inst_bytes: 0,
5759
}
5860
}
5961

@@ -420,6 +422,8 @@ impl Compiler {
420422
}
421423

422424
fn c_class(&mut self, ranges: &[hir::ClassUnicodeRange]) -> ResultOrEmpty {
425+
use std::mem::size_of;
426+
423427
assert!(!ranges.is_empty());
424428
if self.compiled.uses_bytes() {
425429
Ok(Some(CompileClass { c: self, ranges: ranges }.compile()?))
@@ -429,6 +433,8 @@ impl Compiler {
429433
let hole = if ranges.len() == 1 && ranges[0].0 == ranges[0].1 {
430434
self.push_hole(InstHole::Char { c: ranges[0].0 })
431435
} else {
436+
self.extra_inst_bytes +=
437+
ranges.len() * (size_of::<char>() * 2);
432438
self.push_hole(InstHole::Ranges { ranges: ranges })
433439
};
434440
Ok(Some(Patch { hole: hole, entry: self.insts.len() - 1 }))
@@ -795,7 +801,9 @@ impl Compiler {
795801
fn check_size(&self) -> result::Result<(), Error> {
796802
use std::mem::size_of;
797803

798-
if self.insts.len() * size_of::<Inst>() > self.size_limit {
804+
let size =
805+
self.extra_inst_bytes + (self.insts.len() * size_of::<Inst>());
806+
if size > self.size_limit {
799807
Err(Error::CompiledTooBig(self.size_limit))
800808
} else {
801809
Ok(())

tests/regression_fuzz.rs

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,15 @@ fn fuzz1() {
1717
fn empty_any_errors_no_panic() {
1818
assert!(regex_new!(r"\P{any}").is_err());
1919
}
20+
21+
// This tests that a very large regex errors during compilation instead of
22+
// using gratuitous amounts of memory. The specific problem is that the
23+
// compiler wasn't accounting for the memory used by Unicode character classes
24+
// correctly.
25+
//
26+
// See: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33579
27+
#[test]
28+
fn big_regex_fails_to_compile() {
29+
let pat = "[\u{0}\u{e}\u{2}\\w~~>[l\t\u{0}]p?<]{971158}";
30+
assert!(regex_new!(pat).is_err());
31+
}

0 commit comments

Comments
 (0)