Skip to content

(core::str) Boyer-Moore string searching #1932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

killerswan
Copy link
Contributor

Here, I've added a Boyer-Moore string search. It computes a pair of tables based on the "needle" being searched for, and then uses them to go faster through the "haystack"...

I've used it within str::find_str_between and str::iter_matches, and also added these functions:

  • str::findn_str_between
  • str::findn_str
  • vec::ends_with
  • str::chars_iteri

@marijnh
Copy link
Contributor

marijnh commented Mar 6, 2012

Did you benchmark on small strings? I would expect that for anything shorter than than a hundred characters, a naive search (which doesn't allocate anything) is much faster than Boyer-Moore. Might be worthwhile to conditionalize the finding functions to pick an algorithm based on the haystack size (after informed benchmarking, of course).

@killerswan
Copy link
Contributor Author

Yeah, that definitely needs doing. (I've benchmarked enough to know it will be worth it.)

@ghost ghost assigned brson Mar 8, 2012
@catamorphism
Copy link
Contributor

@brson -- hopefully you can do any remaining review/merging since you've merged other pull requests for core::str.

@brson
Copy link
Contributor

brson commented Mar 8, 2012

I'll go ahead and merge this and find a reasonable cutoff point to switch from naive search to boyer-moore.

@brson
Copy link
Contributor

brson commented Mar 8, 2012

This needs some tuning.

In my cursory testing the performance is significantly worse than the naive implementation. I think we need some measurements that show when it makes sense to switch to boyer-moore.

This test case doesn't terminate before I get bored and kill it: https://gist.github.com/2003993

@killerswan
Copy link
Contributor Author

Hmm, I expected large needles in small haystacks to be unimpressive, but that's worse than I expected! I bet a zero-copy slice would fix everything, but in the meantime I have several other ideas...

I've just been busy at work this week, though.

@killerswan
Copy link
Contributor Author

In progress. Would you all rather I close this and re-request later, or let this sit in the queue while we fiddle with this, or what?

@marijnh
Copy link
Contributor

marijnh commented Mar 9, 2012

Feel free to leave the pull req open -- unless you're planning to let this sit for a month or so.

@brson
Copy link
Contributor

brson commented Apr 1, 2012

@killerswan what's the status? it looks like you've made some improvements

@killerswan
Copy link
Contributor Author

@brson Yeah... but I've now done enough testing to know that the average "needle" and "haystack" in a make check for Rust, for example, are so small (8.2 and 18.2 bytes, with the very largest haystack at only 1604 bytes), that I think maybe no variation of this is worth including in the library right now...

I'm going to put this and my test code in another repository, and close the pull request. (Edit, for reference: https://github.com/killerswan/boyer-moore-search).

@killerswan killerswan closed this Apr 1, 2012
@brson
Copy link
Contributor

brson commented Apr 1, 2012

@killerswan That's a bummer, but it happens. I have many branches that will never go anywhere because they didn't pay off like I expected. Thanks for the effort though.

@huonw huonw mentioned this pull request May 11, 2014
celinval added a commit to celinval/rust-dev that referenced this pull request Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants