(core::str) Boyer-Moore string searching #1932

killerswan · 2012-03-06T09:03:01Z

Here, I've added a Boyer-Moore string search. It computes a pair of tables based on the "needle" being searched for, and then uses them to go faster through the "haystack"...

I've used it within str::find_str_between and str::iter_matches, and also added these functions:

str::findn_str_between
str::findn_str
vec::ends_with
str::chars_iteri

marijnh · 2012-03-06T09:13:11Z

Did you benchmark on small strings? I would expect that for anything shorter than than a hundred characters, a naive search (which doesn't allocate anything) is much faster than Boyer-Moore. Might be worthwhile to conditionalize the finding functions to pick an algorithm based on the haystack size (after informed benchmarking, of course).

killerswan · 2012-03-06T09:23:47Z

Yeah, that definitely needs doing. (I've benchmarked enough to know it will be worth it.)

catamorphism · 2012-03-08T21:58:38Z

@brson -- hopefully you can do any remaining review/merging since you've merged other pull requests for core::str.

brson · 2012-03-08T22:24:00Z

I'll go ahead and merge this and find a reasonable cutoff point to switch from naive search to boyer-moore.

brson · 2012-03-08T23:02:13Z

This needs some tuning.

In my cursory testing the performance is significantly worse than the naive implementation. I think we need some measurements that show when it makes sense to switch to boyer-moore.

This test case doesn't terminate before I get bored and kill it: https://gist.github.com/2003993

killerswan · 2012-03-09T05:18:32Z

Hmm, I expected large needles in small haystacks to be unimpressive, but that's worse than I expected! I bet a zero-copy slice would fix everything, but in the meantime I have several other ideas...

I've just been busy at work this week, though.

killerswan · 2012-03-09T11:26:21Z

In progress. Would you all rather I close this and re-request later, or let this sit in the queue while we fiddle with this, or what?

marijnh · 2012-03-09T11:42:42Z

Feel free to leave the pull req open -- unless you're planning to let this sit for a month or so.

…re_search for testing

this is currently Boyer-Moore-Horspool

…e faster

brson · 2012-04-01T07:17:15Z

@killerswan what's the status? it looks like you've made some improvements

killerswan · 2012-04-01T17:50:18Z

@brson Yeah... but I've now done enough testing to know that the average "needle" and "haystack" in a make check for Rust, for example, are so small (8.2 and 18.2 bytes, with the very largest haystack at only 1604 bytes), that I think maybe no variation of this is worth including in the library right now...

I'm going to put this and my test code in another repository, and close the pull request. (Edit, for reference: https://github.com/killerswan/boyer-moore-search).

brson · 2012-04-01T22:09:44Z

@killerswan That's a bummer, but it happens. I have many branches that will never go anywhere because they didn't pay off like I expected. Thanks for the effort though.

ghost assigned brson Mar 8, 2012

killerswan added 14 commits March 26, 2012 04:40

(core::str) add chars_iteri

1af414b

(core::vec) add ends_with

e95006d

(core::str) add Boyer-Moore string searching

ef827fb

touchups

e8fb664

(core::str) export findn_str_between

ababa88

(core::str) add simple_search and temporarily export it and boyer_moo…

d1a26a4

…re_search for testing

touchups

ce62b60

(core::str) tweaking some assertions

80bcb8c

(core::str) simplifying boyer_moore_matching_suffixes...

ae3c3bb

(core::str) demonstrate that the suffix table currently sucks, i.e.,

cb4e968

this is currently Boyer-Moore-Horspool

(core::str) significantly improved boyer-moore, still testing...

90e903a

(core::str) updated Boyer-Moore again, with faster good-suffix calc

95f5b36

(core::str) based on testing so far, choose boyer-moore when it can b…

6e8de23

…e faster

touchups

f7aa6b2

killerswan closed this Apr 1, 2012

huonw mentioned this pull request May 11, 2014

String searching is slow #14107

Closed

celinval added a commit to celinval/rust-dev that referenced this pull request Jun 4, 2024

Remove symbol table passes. (rust-lang#1932)

fc80db6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(core::str) Boyer-Moore string searching #1932

(core::str) Boyer-Moore string searching #1932

Uh oh!

killerswan commented Mar 6, 2012

Uh oh!

marijnh commented Mar 6, 2012

Uh oh!

killerswan commented Mar 6, 2012

Uh oh!

catamorphism commented Mar 8, 2012

Uh oh!

brson commented Mar 8, 2012

Uh oh!

brson commented Mar 8, 2012

Uh oh!

killerswan commented Mar 9, 2012

Uh oh!

killerswan commented Mar 9, 2012

Uh oh!

marijnh commented Mar 9, 2012

Uh oh!

brson commented Apr 1, 2012

Uh oh!

killerswan commented Apr 1, 2012

Uh oh!

brson commented Apr 1, 2012

Uh oh!

Uh oh!

(core::str) Boyer-Moore string searching #1932

(core::str) Boyer-Moore string searching #1932

Uh oh!

Conversation

killerswan commented Mar 6, 2012

Uh oh!

marijnh commented Mar 6, 2012

Uh oh!

killerswan commented Mar 6, 2012

Uh oh!

catamorphism commented Mar 8, 2012

Uh oh!

brson commented Mar 8, 2012

Uh oh!

brson commented Mar 8, 2012

Uh oh!

killerswan commented Mar 9, 2012

Uh oh!

killerswan commented Mar 9, 2012

Uh oh!

marijnh commented Mar 9, 2012

Uh oh!

brson commented Apr 1, 2012

Uh oh!

killerswan commented Apr 1, 2012

Uh oh!

brson commented Apr 1, 2012

Uh oh!

Uh oh!