-
Notifications
You must be signed in to change notification settings - Fork 13.3k
(core::str) Boyer-Moore string searching #1932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Did you benchmark on small strings? I would expect that for anything shorter than than a hundred characters, a naive search (which doesn't allocate anything) is much faster than Boyer-Moore. Might be worthwhile to conditionalize the finding functions to pick an algorithm based on the haystack size (after informed benchmarking, of course). |
Yeah, that definitely needs doing. (I've benchmarked enough to know it will be worth it.) |
@brson -- hopefully you can do any remaining review/merging since you've merged other pull requests for |
I'll go ahead and merge this and find a reasonable cutoff point to switch from naive search to boyer-moore. |
This needs some tuning. In my cursory testing the performance is significantly worse than the naive implementation. I think we need some measurements that show when it makes sense to switch to boyer-moore. This test case doesn't terminate before I get bored and kill it: https://gist.github.com/2003993 |
Hmm, I expected large needles in small haystacks to be unimpressive, but that's worse than I expected! I bet a zero-copy slice would fix everything, but in the meantime I have several other ideas... I've just been busy at work this week, though. |
In progress. Would you all rather I close this and re-request later, or let this sit in the queue while we fiddle with this, or what? |
Feel free to leave the pull req open -- unless you're planning to let this sit for a month or so. |
…re_search for testing
this is currently Boyer-Moore-Horspool
@killerswan what's the status? it looks like you've made some improvements |
@brson Yeah... but I've now done enough testing to know that the average "needle" and "haystack" in a I'm going to put this and my test code in another repository, and close the pull request. (Edit, for reference: https://github.com/killerswan/boyer-moore-search). |
@killerswan That's a bummer, but it happens. I have many branches that will never go anywhere because they didn't pay off like I expected. Thanks for the effort though. |
Here, I've added a Boyer-Moore string search. It computes a pair of tables based on the "needle" being searched for, and then uses them to go faster through the "haystack"...
I've used it within str::find_str_between and str::iter_matches, and also added these functions: